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ABSTRACT 



A method for automatically defining aggregates for use in a 
datamart is described. The datamart includes fact and dimen- 
sion tables. The method comprises accessing a schema 
description and an aggregates description for the datamart. 
The schema description specifies a schema, which in turn, 
defines the relationships between the fact tables and dimen- 
sion tables of the datamart. The aggregates description 
specifies the aggregates, which define, from the schema 
definition, which aggregate tables are to be created from the 
fact tables and dimension tables in the datamart. The data in 
the aggregates correspond to the pre-computed results of 
specific types of queries. In response to a query, the aggre- 
gates can be searched to determine an appropriate aggregate 
to use in response to that query. The schema description is 
used to create a first set of commands to create and populate 
the fact and dimension tables. Additionally, a second set of 
commands to create, populate and access, the aggregates are 
also created from the aggregates description. Some of the 
commands of the first set of commands are executed causing 
the creation and population of the tables. Some of the 
commands of the second set of commands are executed 
causing the creation of the aggregate tables. Some of the 
remaining commands of the second set of commands are 
executed to populate the aggregate tables from the populated 
fact and dimension tables. 
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METHOD AND APPARATUS FOR CREATING 
AGGREGATES FOR USE IN A DATAMART 

CROSS REFERENCES TO RELATED 
APPLICATIONS 

This application relates to the following group of appli- 
cations. Each application in the group relates to, and incor- 
porates by reference, each other application in the group. 
The invention of each application is assigned to the assignee 
of this invention. The group of applications includes the 
following. 

U.S. patent application Ser. No. 09/073,748, entitled 
"Method and Apparatus for Creating a Well-Formed Data- 
base System Using a Computer," filed May 6, 1998, and 
having inventors Craig David Weissman, Greg Vincent 
Walsh and Eliot Leonard Wegbreit. 

U.S. patent application Ser. No. 09/073,752, entitled 
"Method and Apparatus for Creating and Populating a 
Datamart," filed May 6, 1998, and having inventors Craig 
David Weissman, Greg Vincent Walsh and Lynn Randolph 
Slater, Jr. 

U.S. patent application Ser. No. 09/073,733, entitled 
"Method and Apparatus for Creating Aggregates for Use in 
a Datamart," filed May 6, 1998, and having inventors Allon 
Rauer, Gregory Vincent Walsh, John P. McCaskey, Craig 
David Weissman and Jeremy A. Rassen. 

U.S. patent application Ser. No. 09/073,753, entitled 
"Method and Apparatus for Creating a Datamart and for 
Creating a Query Structure for the Datamart," filed May 6, 
1998, and having inventors Jeremy A. Rassen, Emile Litvak, 
abhi a. shelat, John P. McCaskey and Allon Rauer. 

COPYRIGHT NOTICE 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by any one of the patent disclosure, as it 
appears in the Patent and Trademark OfEce patent files or 
records, but otherwise reserves all copyright rights whatso- 
ever. 

THE FIELD OF THE INVENTION 

This invention relates to the field of databases. In 
particular, the invention relates to creating databases, and 
loading and accessing data in the databases. 

BACKGROUND OF THE INVENTION 

Many different types of databases have been developed. 
On line transaction processing (OLTP) databases are 
examples of typical databases used today. OLTP databases 
are concerned with the transaction oriented processing of 
data. On line transaction processing is the process by which 
data is entered and retrieved from these databases. In these 
transaction-oriented databases, every transaction is guaran- 
teed. Thus, at a very low level, the OLTP databases are very 
good at determining whether any specific transaction has 
occurred. 

Another type of database is a data warehouse or datamart. 
A datamart transforms the raw data from the OLTP data- 
bases. The transformation supports queries at a much higher 
level than the OLTP atomic transaction queries. A data 
warehouse or a datamart typically provides not only the 
structure for storing the data extracted from the OLTP 
databases, but also query analysis and publication tools. 
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The advantage of datamarts is that users can quickly 
access data that is important to their business decision 
making. To meet this goal, datamarts should have the 
following characteristics. First, datamarts should be consis- 

5 tent in that they give the same results for the same search. 
The datamart should also be consistent in the use of terms to 
describe fields in the datamart. For example, "sales" has a 
specific definition, that when fetched from a database, 
provides a consistent answer. Datamarts should also be able 

10 to separate and combine every possible measure in the 
business. Many of these issues are discussed in the following 
book, Ralph Kimball, The Data Warehouse Toolkit, John 
Whiley and Sons, Inc., New York, N.Y. (1996). 

Multi-dimensional datamarts are one kind of datamart. 

15 Multi-dimensional datamarts rely on a dimension modeling 
technique to define the schema for the datamart. Dimension 
modeling involves visualizing the data in the datamart as a 
multi-dimension data space (e.g., image the data as a cube). 
Each dimension of that space corresponds to a different way 

20 of looking at the data. Each point in the space, defined by the 
dimensions, contains measurements for a particular combi- 
nation of dimensions. For example, a three dimensional cube 
might have product, customer, and territory dimensions. Any 
point in that cube, defined by those three dimensions, will 

25 represent data that relates those three dimensions. 

The data in the datamart is organized according to a 
schema. In a dimensional datamart, the data is typically 
organized as a star schema. At the center of a standard star 
schema is a fact table that contains measure data. Radiating 

30 outward from the fact table, like the points of a star, are 
multiple dimension tables. Dimension tables contain 
attribute data, such as the names of customers and territories. 
The fact table is connected, or joined, to each of the 
dimension tables, but the dimension tables are connected 

35 only to the fact table. This schema differs from that of many 
conventional relational databases where many tables are 
joined. The advantage of such a schema is that it supports a 
top down business approach to the definition of the schema. 

^ Present datamarts have a number of drawbacks that are 
now discussed. First, datamarts are typically difficult to 
build and maintain. This is because of the requirements that 
they be consistent and flexible. A related drawback of 
present day datamarts is that they do not allow the consult- 

45 ants of the datamart to make changes to the schema simply 
and easily. Because datamarts support very high level que- 
ries about the business processes in the business, they 
require a great deal of consistency in the use of data from the 
OLTP systems. Additionally, the datamarts need to be very 

50 flexible to address changes in the types of high level queries 
supported. Changing typical datamarts require the changing 
of hundreds, or potentially thousands, of lines of SQL code. 
For example, if a fact column is added to a fact table, the 
change propagates throughout the datamart. These changes 

55 are typically implemented by hand, a very time consuming 
and error prone process. As a result of the hand coding 
involved, it is quite possible to construct the database in an 
arbitrary fashion that does not conform to good rules for 
constructing datamarts. Thus, well-formed datamarts may 

, rt not result. 

ou 

Thus an improved data warehousing technology is 
desired. 

A SUMMARY OF THE INVENTION 

65 One embodiment of the invention includes a method for 
automatically defining aggregates for use in a datamart. The 
datamart includes fact and dimension tables. The method 
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comprises accessing a schema description and an aggregates 
description for the datamart. The schema description speci- 
fies a schema, which in turn, defines the relationships 
between the fact tables and dimension tables of the datamart. 
The aggregates description specifies the aggregates, which 5 
define, from the schema definition, which aggregate tables 
are to be created from the fact tables and dimension tables 
in the datamart. The data in the aggregates correspond to the 
pre-computed results of specific types of queries. In 
response to a query, the aggregates can be searched to 10 
determine an appropriate aggregate to use in response to that 
query. The schema description is used to create a first set of 
commands to create and populate the fact and dimension 
tables. Additionally, a second set of commands to create, 
populate and access, the aggregates are also created from the 15 
aggregates description. Some of the commands of the first 
set of commands are executed causing the creation and 
population of the tables. Some of the commands of the 
second set of commands are executed causing the creation of 
the aggregate tables. Some of the remaining commands of 20 
the second set of commands are executed to populate the 
aggregate tables from the populated fact and dimension 
tables. 

In some embodiments, the aggregates description speci- 
fies a set of aggregate groups which define the aggregates to 25 
be created for one or more dimension tables. 

Although many details have been included in the descrip- 
tion and the figures, the invention is defined by the scope of 
the claims. Only limitations found in those claims apply to 
the invention. 

A BRIEF DESCRIPTION OF THE DRAWINGS 

The figures illustrate the invention by way of example, 
and not limitation. Like references indicate similar elements. 35 

FIG. 1 illustrates a datamart system representing one 
embodiment of the invention. 

FIG. 2 illustrates an embodiment of a method of defining 
the datamart, loading the datamart, and then querying the 
data. 40 

FIG. 3 illustrates a schema used in the system of FIG. 1 
to define scbemas for the datamart. 

FIG. 4 illustrates a schema used in the data extraction and 
loading process. 

FIG. 5 illustrates a runtime schema including aggregates. 

FIG. 6 illustrates a query mechanism and user interface 
schema. 

FIG. 7 through FIG. 29 describe a user interface that can 
be used to define a schema, build a datamart, load the 50 
datamart, and query the datamart. 

FIG. 30 through FIG. 36 describe a user interface that can 
be used by a consultant to set up the query interface for a 
user and to provide the reporting interface. 

THE DESCRIPTION 55 
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Introduction to the Description 

The following describes a system according to various 
embodiments of the invention. Generally, the system allows 
a consultant to define a well-formed datamart. The system 
includes tables and columns that conform to the definition of 
the datamart. The system also includes additional columns 
for foreign key tracking, source system key mapping, time 
and date tracking. The system has automatic indexing. The 
system enforces typing information about the data stored in 
the datamart. These additional features cause the datamart to 
operate in a consistent manner. One benefit of such consis- 
tent operation is that results are consistent in meaning from 
query to query. 
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Focusing on the datamart creation, the system allows a 
consultant to build a datamart from a schema definition and 
a definition of the sources of the data. From the schema 
definition, the system automatically builds the tables needed 
in the datamart. Also, from the schema definition, and the 5 
sources definition, the system can automatically extract the 
data from those sources. Depending on the semantic mean- 
ing of the data, as defined by the schema definition, the 
system automatically converts the data from the sources into 
forms that are readily usable in the datamart. Once the 10 
datamart has been created, and the data has been loaded, 
users can then perform queries on the data. 

As part of the datamart creation, the system allows the 
consultant to define aggregates for the datamart. The aggre- 
gates correspond to pre-computed query results for different 15 
types of queries. For example, an aggregate can be created 
for a query that asks for all sales, by region, by quarter. The 
corresponding aggregate table would include a set of rows 
that have the results for this query (e.g., each row includes 
the quarterly sales for each region). The aggregates are 20 
specified using the schema definition. This makes defining 
and changing aggregates relatively simple. 

To allow a user to query the datamart, the system includes 
an interface for defining what fields can be used by the user 
to query the datamart. Additionally, by allowing the con- 25 
sultant to define measure and related information, the system 
allows the consultant to specify how the results are to appear 
to the users. 

The following description first presents a system level 
view of primarily one embodiment. Then, an example use of 
the system is presented. Next, the metadata used in the 
system is described. This metadata description is broken into 
four parts: a top level description of the metadata used in 
defining schemas, a description of the metadata used during 
the extraction, a description of the metadata used while the 35 
datamart is running, and a description of the query interface 
metadata. Next, an example set of user interface screen shots 
illustrates how consultants can quickly and efficiently define 
schemas, aggregates, and query interfaces, and how users 
can query the datamart. Next, additional alternative embodi- 
ments are described. 

Definitions 

Datamart or Data Warehouse — is a database. 

45 

Schema — is a description of the organization of data in a 
database. Often, the schema is defined using a data definition 
language provided by a database management system. More 
abstractly, the schema can be the logical definition of a data 
model for use in a database. ^ 

Metadata — is data that defines other data. This is not the 
actual data in the datamart, but is the data that defines the 
data in the datamart. 

Constellation— a grouping of dimension definitions, fact 
definitions, like-structured facts (all facts in a constellation 55 
have the same dimensional foreign keys), or stars, and other 
metadata definitions. Often the grouping relates to a busi- 
ness process (e.g., sales). 

Fact Table — the central table of a star schema. It stores the 
numeric measurements of the business that is supplying the 60 
information to the datamart. 

Measurement — is a piece of data in a fact table, or an 
arithmetic combination of data. 

Dimension — the tables that link to the fact table in a star 
schema. The tables store the descriptions of the dimensions 65 
of the business. Examples of dimensions are product and 
territory. 



Attributes — are the fields of a dimension table (e.g., 
product name, country name). 

User — any end user who would normally wish to query a 
datamart, but would not usually be concerned with the 
implementation or maintenance of the datamart. 

Consultant — is a person responsible for the creation and 
maintenance of a datamart. 

Source System — is any computer system that holds the 
raw data used by the system. Examples of such source 
systems are OLTP database systems. 

Data Store — any data storage (physical or logical) from 
which data is received or to which data is stored. Examples 
of a data store are files, a database, etc, 

Computer — is any computing device (e.g., PC compatible 
computer, Unix workstation, etc.). Generally, a computer 
includes a processor and a memory. A computer can include 
a network of computers. 

Program — a sequence of instructions that can be executed 
by a computer. A program can include other programs. A 
program can include only one instruction. 

Datamart System 

FIG. 1 illustrates a datamart system representing one 
embodiment of the invention. The system supports the 
creation of a well-formed datamart. This system allows 
consultants to use metadata to define schemas for a datamart. 
From the definition of the schema, the system can automati- 
cally generate the tables in the datamart. Further, the system 
can automatically extract the data from the source systems, 
perform conversions on that data and populate the datamart. 
Tlie system supports the automatic creation and processing 
of aggregates from aggregate definitions. The system also 
supports the creation of the query mechanisms from query 
definitons. 

The following description first lists all the elements of 
FIG. 1, then describes each of those elements, and then 
discusses how those elements operate together. 
System Element List 

FIG. 1 includes the following elements: source systems 
110, a system 100, a web server 186, a consultant computer 
190, and a user computer 180. The system 100 includes the 
metadata 160, an enterprise manager 102, an extraction 
program 120, staging tables 130, a semantic template con- 
version program 140, a datamart 150, an aggregate builder 
170, and a query and reporting program 104. The metadata 
160 includes the following data: schema definitions 161, 
connectors 162 (connectors are also referred to as 
extractors), semantic definitions 163, source system infor- 
mation 164, aggregate information 167, measurement infor- 
mation 168, and query/reporting information 169. The user 
computer 180 is shown running a browser 182. The browser 
182 includes a query/results interface 184. The consultant 
computer 190 shows the enterprise manager interface 192 
which shows the metadata organization of the system 100. 
System Element Descriptions 

The following describes the metadata 160, then the other 
elements of the system 100, and finally, the elements that are 
external to the system 100. These elements are all described 
in greater detail below. 

Metadata Overview 

The metadata 160 includes many different types of data 
and information. This information can be broken down into 
information related to (1) the definition of the schema for the 
datamart 150, (2) the data needed during the extraction from 
the source systems 110 and loading of the datamart 150, and 
(3) the information used in the querying of the datamart 150 



03/31/2003, EAST Version: 1.03.0007 



6,161,103 

7 8 

and supplying the result sets. The relationships between the or ODBC) database. However, this physical structure of the 

elements of the metadata 160 are described in greater detail system 100 can be implemented in any number of ways, and 

below. However, the following provides brief descriptions the invention does not require this specific hardware con- 

of these elements. figuration. 

The schema definitions 161 hold the definition of the 5 The enterprise manager 102 is a program that is respon- 

schema for the datamart 150. Typically, a consultant, using sible for supporting the definition of the schema, and the 

the consultant computer 190, can interface with the enter- creation of the tables in the datamart 150 from the schema 

prise manager 102 to define the schema definition 161 for definitions 161. The enterprise manager 102 also controls 

the datamart 150. In particular, the consultant can use the the extraction program 120. (In some embodiments, the 

enterprise manager interface 192 to define a star schema for to extraction program 120 and the semantic template conver- 

the datamart 150. This star schema is organized around the sion program 140 are included in the enterprise manager 

business processes of the business for which the datamart is 102). During the execution of the extraction program 120, 

being created. What is important is that the consultant can the extraction program 120, the staging tables 130, the 

easily define a schema for the datamart 150 and that defi- semantic template conversion 140, and the datamart 150 are 

nition is kept in the schema definitions 161. From the 15 all used. The extraction program 120 uses the connectors 

schema definitions 161, not only can the tables in the 162 and the source system information 164 to extract the 

datamart 150 be generated, but also the automatic extraction information from the source systems 110. The extracted data 

and conversion of the data from the source systems 110 can is loaded into the staging tables 130. 

be performed, aggregates are set up, and a query mechanism The staging tables 130 are temporary tables used to hold 

is generated. 20 the source system data before performing any semantic 

The connectors 162, the semantic definitions 163, and the conversions on that data. The staging tables 130 also allow 

source system information 164, are all related to the extrac- for the conversion of the source system data prior to moving 

tion of the data from the source systems 110. The connectors the data into the datamart 150. 

162 define the access routines for extracting the source Once the staging tables 130 have been loaded, the seman- 

systems data 110. The semantic definitions 163 define how 25 tic definitions 163 can be accessed from the enterprise 

that extracted data should be converted when it is loaded into manager 102 to convert the information in the staging tables 

the datamart 150. The semantic definitions 163 provide 130 to predefined data semantics. These predefined data 

important advantages to the system 100. In particular, the semantics allow for powerful queries, consistency in the 

semantic definitions 163 allow for a simplified definition of definition of the meaning of the data in the datamart 150, and 

the datamart 150, consistent meaning of the data in the 30 allow for changes to be made to the schema. Generally, the 

datamart 150, and allow for complex changes to the schema semantic template conversion 140 takes data stored in the 

to be easily propagated to the datamart 150. The source staging tables 130, performs a conversion of that data 

system information 164 defines how to extract the data from according to a corresponding semantic definition (defined in 

the systems 110. the schema definitions 161), and populates the datamart 150 

The aggregate information 167 defines how data in the 35 with the converted data, 

datamart 150 is treated once it is extracted. The aggregate Importantly, the predefined data semantics substantially 

information 167 allows for the creation of aggregates. simplify the creation and population of the datamart 150. In 

Aggregates are aggregations of various fields of data in the previous systems, the consultant would have to implement 

datamart 150. Aggregates support more complex and pow- all of the data manipulation and population programs by 

erful queries to be executed on the datamart 150. The 40 hand. By selecting a particular semantic definition for a 

aggregates also improve the performance of the system particular fact, or dimension, in the schema, the consultant 

during the querying process and allow for time navigation of has automatically defined the access and manipulation for 

the data in the datamart 150. Time navigation is the process populating programs for that table. Allowing the consultant 

of creating backlog result sets by hopping through date to select a predefined data semantic not only reduces the 

aggregates from the beginning of time in the datamart 150 45 tedious coding previously required of the consultant, but 

to the present. also allows for the automatic insertion of foreign keys, 

The measurement information 168 and the query/ transaction types, date, and other information into the 

reporting information 169 support the querying of the data- schema, and therefore the datamart 150. This additional 

mart 150. A measure is a piece of numeric data in the information causes the datamart 150 to be well- formed, 

datamart 150 that is useful to a user. That is, individual fact 50 The aggregate builder 170, as mentioned above, aggre- 

columns from source systems can be very implementation gates data in the datamart 150 according to the aggregate 

specific. These columns may not correspond to what users information 167 and the schema definitions 161. The results 

would prefer to see. For example, a user may want to see a of the aggregate builder 170 allow for more powerful and 

net price added with a total cost. However, the fact table may faster queries to be performed on the datamart 150. 

only include the net price or the total cost. The measure 55 The query/reporting program 104 supports the querying 

information 168 allows the consultant to define the abstract of the datamart 150 and presents results of those queries. The 

notion of the calculation associated with the net price added query and reporting process 104 uses the measurement 

to the total cost. information 168 and the query and reporting information 

In some embodiments of the invention, the metadata 160 169, in addition to the schema definitions 161, to query the 

also includes security information. The security information 60 datamart 150 and provide that information to the web server 

defines the level of access for various users to the various 186. The query/reporting, information 169 includes filters 

tables and fields in the datamart 150. This security informa- and form definitions. The filters allow the user to filter 

tion automatically restricts access to that data. different fields out of the datamart 150. The forms allow the 

System Overview users to indicate which fields a user is particularly interested 

The system 100 can be implemented on a network of 65 in. 

computers running Windows NT and UNIX. The datamart The metadata 160, although including many different 

150 can be implemented on top of an Oracle (SQL Server, types of definitional data, importantly includes the schema 
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definition 161 and the semantic definitions 163. The enter- in the schema is to be derived. That is, the consultant defines 

prise manager 102 can use the schema definitions 161 to from which fields and tables the information is to be 

build the tables in the datamart 150. Through the combina- extracted from the source systems 110. The consultant also 

tion of these two pieces of metadata 160, the enterprise defines how that data is to be put into the datamart 150. That 

manager 102 can take data from a source system 110, 5 is, the consultant associates each piece of data with a 

perform semantic conversions on that data and populate the semantic meaning. This semantic meaning defines how the 

datamart 150. Thus, in some embodiments of the invention, data from the source system is to be manipulated and how 

the system includes only the schema definitions 161 and the it is to populate the datamart 150. At this point, the consult- 

semantic definitions 163. ant can also define the aggregates that can be used in the 

External Elements 10 datamart 150. 

The source systems 110, as defined above, represent large Once the datamart 150 has been defined, it can then be 

databases from which data for the datamart 150 is pulled. automatically built. At block 220, the enterprise manager 

Examples of such systems include large on fine transaction 102 generates table creation SQL statements according to 

processing (OLTP) systems. Typically these source systems me definition of the metadata. In one embodiment of the 

110 are relational databases having multiple tables and is invention, block 220 is accomplished by performing queries 

relations between those tables. The source systems 110 do 0 n the schema definitions 161 to generate the fact table 

not generally support powerful queries that provide high creation statements, the fact staging table creation 

level information about the business in which the source statements, the dimension table creation statements, the 

systems U0 are used. Thus, the system 100 is used to extract dimension staging table creation statements, and the dimen- 

the data from the source systems 110 and to provide an 20 s i on mapping table creation statements. These tables are 

improved schema for querying that data. In some described in greater detail below. From the results of these 

embodiments, the source systems 110 include non-relational queries, SQL CREATE TABLE statements are created, 

databases such as object databases. In other embodiments, Importantly, the schema definitions 161 provide the infor- 

the source systems 110 can include flat file systems or mation the enterprise manager 102 needs to build the 

combined relational and object databases. What is important 25 datamart 150. 

is the source systems 110 can provide data to the system 100 Note mat mis process C an also be used to modify the 

through the connectors 162 and the source system informa- schema of an existing datamart 150. Therefore, at block 220, 

tion 164. me SQL tables being created wifi cause the existing datamart 

The consultant computer 190 represents any computing 150 to be mo dified without losing the data in the datamart 

device that allows a consultant to access the system 100. 30 -^50 

(Access to the system 100 can be through a network.) What M ^ ^ ^ ente ^ er m ^ the table 

K important is that the consultant computer 190 allows ; the tion statements t0 the database upon which the data . 

consultant to interface with the enterprise manager 102. _* ■ u ■ *jt-u*j*u * *u * ui 

» T . , . i mart 150 is bemg created. That database creates the tables, 

Note that the enterprae manager 102 can run on the which co nd to the datamart ls0 Miel block ^ tne 

consultant computer 190. 35 u M * 4 * A / - , , 

^ r , , „ n s , , . build the datamart process 202 is complete. 

The user can access the web server 186 through the user - 1 



computer 180. In this example, the user computer 180 can 
access the web server 186 through an HTTP connection. The 
user computerl gO sends a file request to the web ser vex!86. 
This file-reqiiisLmpKsents^a req uest tor a~~cjuery ofj he 
d at am art JSjLJThe web server 186 ninsaJav a program upon 
receivingjhe re quest/The query anfr reporting program 104 
converts t he information from the Java program into a, query 
that foe^dataniar H.50 w ill understand. In one embodiment, 
the query/re porting p rogram lfrTconverts the query into a . 



set of SQL,statemer5sTTrre~SQtTstatements are run against 
the datamart 150. The results of the statements are processed 
and provided back to the user computer 180. v 



Now the extraction process 204 can be performed. The 
extraction process 204 is run on a periodic basis to load data 
from the source systems 110 into the datamart 150. This 
^ process can be run multiple times for the datamart 150. 
At block 260, the connectors 162 are used by the enter- 
prise manager 102, and in particular, they are used by the 
extraction program 120 to extract the data from the source 
systems 110. The connectors 162 can include SQL statement 
4 5 templates (not to be confused with semantic templates, as 
described below) for extracting data from the source systems 
110. The extraction program 120 uses these templates, in 
addition to the source system information 164, to generate 
SQL statements. These SQL statements are issued to the 
Example Method of Defining and Using the 5Q sourcc system 110 an d the results are loaded into the staging 

Datamart 130. (The staging tables 130 had been created as a 

FIG. 2 illustrates an embodiment of a method of defining result of block 230.) Once the staging tables have been 
the datamart 150, loading the datamart 150, and then access- loaded, the data can then be moved into the datamart 150. 
ing the data in the datamart 150. This example can be broken At block 270, the staging table data is moved into the 
into four subparts: a build datamart process 202, an extrac- 55 datamart 150 using the semantic definitions 163. The seman- 
tion and loading process 204, a build aggregates process tic definitions 163 are templates for converting the staging 
205, and a query and reporting process 206. This example tables 130 data according to predefined data semantics, 
can be implemented using the system 100. These predefined data semantics, as described below, pro- 

At block 210, a consultant uses the enterprise manager vide semantic meaning to the data being loaded from the 
102 to define the schema. The schema is defined using the 60 staging tables 130. Note that the data from the staging tables 
metadata 160. This process is illustrated in greater detail in 130, as processed by the semantic template conversion 140, 
FIG. 7 through FIG. 35. Generally, defining the schema is placed in the tables in the datamart 150. 
involves determining the business processes of the organi- Thus, the schema definition and the semantic definitions 
zation for which the system 100 is being implemented. The 163 are used to generate and populate the datamart 150 such 
consultant then defines the star schema for those business 65 that the datamart 150 is well-formed. Examples of the 
processes. The star schema has a fact table and a number of well-formedness of the datamart 150 are as follows. (1) Two 
dimensions. The consultant also defines from where the data columns related by a relational join will be from the same 
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domain. (2) If table A has a many-to-one relationship to table The following describes the meaning of the various 

B, then table A has a foreign key that corresponds to table graphical elements in FIG. 3 through FIG. 5. Each box in the 

B. (3) A many-to-many relationship, between two tables A figure represents a table having one or more attributes. A first 

and B, is always expressed by an associative table that is table having a diamond graphic extending to second table 

created in a uniform way. For each unique many-to-many 5 (with a dot on the end) indicates that that second table has 

relationship, a unique value is created in the associative table a foreign key pointing to the first table. This can be thought 

and reused whenever that many-to-many relationship of as a parent child relationship. 

occurs. Denormalization is always done correctly. (4) Pull- It is important to remember that FIG. 3 through FIG. 5 

ing information from one table to be put into another table, illustrate the schema of the system used to generate and run 

for access efficiency, is done correctly. Previous systems 10 the datamart 150. Rows in these tables define the schema for 

cannot guarantee such a well-formed database system use in the datamart 150. From these rows, create table, table 

because hand coding of the creation and population opera- query, etc., commands are created. These commands are 

tions is required. This hand coding can easily introduce used to create the tables in the datamart 150 and to access 

errors into datamart creation and population processes. that datamart. 

Once the extraction process 204 has completed, the aggre- 15 Also, as mentioned previously, the datamart 150 is well- 
gates can be built in the build aggregates process 205. The formed because, among other reasons, the system 100 
aggregates are tables of pre-calculated combinations of automatically includes additional columns in the table cre- 
dimensions and facts. Importantly, they greatly increase the ated in the datamart 150. For example, source system key, 
speed of queries. Generally, the aggregate definitions, stored foreign key, and time and date columns are automatically 
in the aggregate information 167, are accessed and built 2 o added (where appropriate). The rest of the elements of the 
using the aggregate definitions (which interface with the system can then rely on the existence of these columns. This 
schema definitions). At block 275, the aggregate builder 170 prevents, for example, the creation of an incoosistent 
accesses the metadata 160 to build the aggregates. Often, the schema where only some of the tables include date and time 
aggregate building is done at night. information. 

After aggregates are built, the querying and reporting 25 The following first lists all of the elements in FIG. 3 and 

process 206 can be performed. The querying and reporting then describes those elements and their relationships, 

process 206 can be performed anytime after the creation of Top Level Metadata List 

the datamart 150. Importantly, when an aggregate is created, FIG. 3 includes the following elements: a constellation 

the appropriate operation for that aggregate is used. For 302, a fact table 304, a dimension base 306, a semantic 

example, revenue elements are added to produce an 30 instance 308, a fact column 310, a fact aggregate operator 

aggregate, while daily account balances are averaged to 312, a fact column number 314, a fact dimension cleansing 

produce an aggregate. 316, a dimension role 320, a degenerative number 322, a 

At block 277, the consultant defines the query mechanism dimension role number 324, a dimension node 326, a 

schema for the system 100. In particular, the consultant dimension column 329, a dimension column number 321, a 

defines the query/reporting information 169 and the mea- 35 cleanse type 323, a cleanse map definition 327, a cleanse 

surement information 168. These two pieces of metadata map 325, a physical type 330, a transaction string 332, a 

160 allow the system 100 to report meaningfully consistent metacolumn 334, an actual table type 336, a dimension base 

information to users. Also, the consultant is not burdened type 328, a special dimension base 391, an aggregate key 

with having to hand create the possible reports. operator 392, an aggregate dimension type 393, an aggregate 

f At block 280, a q uery is generated. In one embodiment of 40 dimension 344, an aggregate group 342, an aggregate fact 

the invention the query is generated at th e query/repo rting 340, a aggregate dimension set 372, a dimension column set 

progr am Ifi jL In other embodiments, tEe query can be 370, a dimension column set definition 374, a fact index 380, 

generated at the user computer 180 through the HTTP, web a fact index definition 384, and a fact index number 382. 

server 186, Java coupling to the query/reporting p rogram Top Level Metadata Descriptions 

104. What is important here is that some query is g enerated 45 It is important to remember that the tables in FIG. 3 are 

that can be used to access the datamart 15 0. Importantly only used to define the schema in the datamart 150. Thus, a 

because the schema definitions 161 are available to the fact table 304 in FIG. 3 is not the actual fact table in the 

query and reporting program 104, the user can be presented datamart 150, but the definition of that fact table. Each row 

with forms from which a query can be easily and automati- m a tebls corresponds to an instance of that table, 

cally generated. 50 The constellation 302 defines the organization of the 

At block 290, the answer set (the results) is created by the schema in the datamart 150. It is the top level table in the 

datamart 150. This answer set is then propagated back schema definition, 

through the query/reporting program 104, and ultimately to Fact Related Tables 

the user computer 180. The results are formatted according ^ fact toWe 304 defines the metadata 160 table describ- 

to the query/reporting information 169. 55 m S a11 of me fact Mqs within a given constellation 302. The 

attributes of the fact table 304 include a build aggregates 

Top Level Metadata Schema flag> a clcansc fla& a constellation key, a description, a fact 

As noted in the background, multi-dimensional datamarts table key, a fact table name, and a truncate stage flag. Each 

use star schemas. The system 100 uses star schemas in a attribute corresponds to a column in the fact table 304. The 

larger organization that allows for the sharing of dimension 60 build aggregate flag indicates whether or not to build aggre- 

tables by sets of similar facts. This larger organization is gates for a particular fact on the next execution of the 

called a constellation. FIG. 3 illustrates a schema for the aggregate builder 170. The cleanse flag is a flag that is used 

schema definitions tables that support constellations. (The in many of the tables to obliterate the actual measures within 

schema of FIG. 3 is labeled the schema for schema defini- a table in the datamart 150 (particularly useful in demon- 

tions 300.) That is, FIG. 3 illustrates a schema used in the 65 strations of the system 100 where sensitive data would 

system 100 to define schemas for the datamart. FIG. 3 also otherwise be revealed). The constellation key points to the 

illustrates some of the aggregate information 167 schema. parent constellation 302 for a given fact table 304. The fact 
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table name is the name of the fact table used in constructing 
the corresponding physical table names in the datamart 150. 
The truncate stage flag is used to indicate whether or not to 
truncate the fact staging table on the next extraction. 

The fact column 310 lists all of the fact attributes within 5 
a single fact table 304. The fact column 310 includes a 
cleanse flag, a description, a fact aggregate operator, a fact 
column key, a fact column name, a fact column number, a 
fact table key, and a physical type. The fact aggregate 
operator is an SQL operator used to aggregate this fact 10 
column in the datamart 150. The fact column key is the 
primary key for the fact column. The fact column name is 
the physical name of the fact column. The fact column 
number counts and orders the number of columns in the fact 
table. The fact table key points to the fact table to which the 15 
corresponding fact column belongs. The fact table key 
points to the fact table to which the fact column belongs. The 
physical type is the database type for the fact column. This 
type is a logical type and provides for independence of 
implementation of the datamart 150 from the underlying 20 
database used. 

The fact column number 314 and the fact aggregate 
operator 312 are used by the fact column 310. These have 
already been described in the context of the fact column 310. 

The fact dimension cleanse table 316 has rows that 25 
indicate the dimension foreign keys in a fact that should be 
cleansed. The fact dimension cleanse table 316 includes a 
dimension role key, a fact dimension cleanse key, and a fact 
table key. The dimension role key indicates that this dimen- 
sion role 320 is part of the "group by" set for cleansing a fact 30 
table without distorting trends in the data. The fact dimen- 
sion cleanse key is the primary key for the fact dimension 
cleanse table 316. The fact table key is the fact table having 
its cleansing properties. 

Dimension Related Tables 35 

The dimension base 306 is the metadata 160 describing all 
the dimension tables that can be used in a given constellation 
302. These dimension bases can then be used in multiple 
constellations. The dimension base 306 includes the follow- 
ing attributes: an aggregate key operator, a cleanse flag, a 40 
description, a dimension base key, dimension base name, a 
dimension base type, and a truncate stage flag. The aggre- 
gate key operator is an SQL operator used by the aggregate 
builder 170 to build aggregates from a dimension. The 
cleanse flag and description act similarly to those attributes 45 
in other tables. The dimension base key is the primary key 
for the dimension base 306. The dimension base name is the 
name of the base dimension used in constructing real tables 
in the datamart 150. The dimension base type indicates the 
type of a dimension base (either default or special (special so 
includes "date" and "transaction type," which are used by 
the system 100). The truncate stage flag operates in the 
manner similar to other truncate stage flags. 

The dimension column 329 defines the list of dimension 
attributes that are valid for a single base dimension 306 and 55 
inherited by a dimension usage. The dimension column 329 
includes a cleanse label, a cleanse map key, a cleanse type, 
a description, a dimension base key, a dimension column 
key, a dimension column name, a dimension column 
number, a dimension number key, grouped by field, a 60 
physical type, a primary key, a time navigation field, and a 
default value. The cleanse label is a label presented to users 
after this column has been cleansed. The cleanse map key is 
for use when cleansing using value mapping. The cleanse 
map key indicates the mapping group to use. The cleanse 65 
type is the method for cleansing the dimension column 329. 
The description is for documenting the dimension column 
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329. The dimension base key is the numbered base in which 
the column resides. The dimension column key is the 
primary key for the dimension column 329. The dimension 
column name is the physical name of the column. The 
dimension column number is the count of the dimension 
columns (to prevent too many from being created in the 
datamart 150). The dimension node key is the aggregate 
hierarchy group in which the column resides. The "group 
by" field is used for special dimensions to indicate whether 
or not this column needs to be "grouped by" during the 
processing by the aggregate builder 170. The physical type 
is a logical database type for this dimension column 329. 
The primary key is used in special dimensions to indicate 
whether or not this column is the primary key. Trie time 
navigation field is for the date special dimension to indicate 
whether or not time navigation should use this field. The 
default value is the default value for the dimension column. 

The dimension column number 321 is a look up table for 
the valid number of dimension columns that can be created. 
The dimension column number counts the number of dimen- 
sion columns to make sure there are not too many being 
defined by the consultant. 

The dimension role 320 is a metadata 160 table that 
describes all of the dimension tables used in a constellation 
302. The dimension role 320 includes a constellation key, a 
degenerative number, a description, a dimension base key, a 
dimension role key, a dimension role name, and a dimension 
role number. The constellation key points to the constella- 
tion 302 in which the dimension role 320 resides. The 
degenerative number defines the order of degenerate col- 
umns within fact tables in a constellation. The description is 
a documentation field for describing a dimension role. The 
dimension base key is the dimension base that this dimen- 
sion role refers to. The dimension role key is the primary key 
for the dimension role 320. The dimension role name is the 
name of the dimension role and is used when constructing 
the foreign keys in the fact tables in the datamart 150. The 
dimension role number defines the order of the dimension 
roles within a constellation. That is, a constellation may have 
multiple dimension roles and the dimension role number 
allows for an ordering of those dimension roles. 

The dimension node 326 is a table used for defining and 
grouping hierarchical dimension attributes that are used by 
the aggregate builder 170. The dimension node includes a 
dimension base key, a dimension note key, a node name, a 
node number, and a parent node number. The dimension 
base key points to the dimension base being defined. The 
dimension node key is the primary key for the dimension 
node. The node name is the logical name for the aggregate 
hierarchy group being defined. The node number is the 
logical number for the aggregate hierarchy group being 
defined. The parent node number is a logical number for the 
parent of the aggregate hierarchy group being defined. 

The degenerative number 322 and the dimension role 
number are defined as described in the dimension role 320. 

The dimension base type 328 is defined as described in 
relation to the dimension base 306. 

Semantic Instance Table 

The semantic instance 308 is a single record that repre- 
sents the manner in which a fact or dimension table is 
extracted from staging tables, manipulated, and then used to 
populate the corresponding table in the datamart 150. The 
semantic instance 308 includes an extraction node key, 
dimension base key, a fact table key, a semantic instance key, 
and a semantic type key. The extraction node key points to 
the extraction node that a particular semantic instance 
belongs to. The dimension base key is the dimension base 
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table owning this semantic instance. The fact table key and a special dimension base key. The "always include an 

points to the fact table owning this semantic instance. Only aggregate" field indicates whether or not this dimension 

one of the dimension base key and the fact table key is filled table must always be included in all aggregates. The default 

in for a semantic instance 308 because the semantic instance aggregate dimension type is the default manner in which this 

can only be applied to one or the other. The semantic 5 dimension is included in aggregate groups. The dimension 

instance key is a primary key for the semantic instance 308. base key is the one to one relationship to a dimension base. 

The semantic type key is the indicator of the type of The list order in fact is the order in fact tables that the foreign 

transformation necessary to construct this type of semantic key to this table will be listed. The physical type of key is 

instance in the datamart 150. the logical database type that foreign keys in the fact tables 

Aggregate Related Tables 10 that point to this special dimension will be. The index flag 

The aggregate builder 170 is a program that uses the is used in indexing. The special dimension base key is the 

aggregate tables and the schema definitions 161 to build primary key for the special dimension base 391. 

aggregates. Often this will be done on a nightly basis. Data Store Related Tables 

The aggregate group 342 defines a set of aggregates to be The physical type 330 defines a look up table of logical 

built for a constellation. An aggregate group 342 will cause 15 data types that are relational database management system 

a combinatorial creation of many aggregate tables in the (RDBMS) independent. The physical type 330 is a logical 

datamart 150. The consultant defines for which dimensions data type that works across various source systems storage 

aggregates are to be built (e.g., the consultant will define that types. The physical type 330 includes a database physical 

one, none, all, etc. columns of a dimension are to be type, a default value, and a special type, 

aggregated on in an aggregate group). The aggregate filter- 20 The translation string 332 defines a list of strings that are 

ing done by the query and reporting program 104 will select translated for different RDBMS 's. The translation string is a 

the most appropriate aggregates for a given query. logical string that can be converted to specific strings for 

The aggregate group 342 includes an aggregate group key, different storage types. Each storage type would correspond 

an aggregate group name, a constellation key, a default flag, to a different source system 110. 

a description, and an enabled field. The aggregate group key 25 Cleansing Related Tables. 

is the primary key for the aggregate group 342. The aggre- ^ cleanse type 32 3 is a look up table to indicate how to 

gate group name is the logical name of this aggregate group. ckanse a ^^5^ column. 

A constellation key points to the constellation in which this ^ , „ . • f • rool 

■ , ^ , , u D j - . . The cleanse map 325 is a mapping table tor mapping real 

aggregate group resides. The default flag indicates whether „ . , „ tu i^e [J^„a^ „ 

■ . • 1 j * n -!? • A 1, 4 . names to cleanse names. The cleanse map 325 includes a 

or not this group is the default group within a constellation. 30 , , , . , . 1 j 1 

_ - . 6 *\ j j * * 11 cleanse map key, which is the primary key, and a cleanse 

Default groups have facts and dimensions automatically i_- u • *u c / c * c 

1 j 1 4 j rn j ... map name, which is the name of a set of mapping pairs for 

added to them. They can also not be deleted. The description « r e ^ U1; , 

* - * the purpose or scrambling data, 

contains the documentation for this aggregate group. The _5 , , - .7. , G *u j * -i c u ♦ 

enabled field indicates whether or not the aggregate builder U ^ C K ma , P ^TT^a d ^ eS , the detads ° f ™ h 

170 will actually build this group. 35 should be mapped to which fields. Tie cleanse map defim- 

/ c * 4 ui i!ia / i + i_ 1 u- f Hon 327 includes cleanse map character ID, a cleanse integer 

The aggregate fact table 340 tracks the membership of a , _ . . * . . ~ 

r . -.I.* * rji. i c 44 wi 1AH ID, a cleanse map definition key, a cleanse map key, and a 

fact within an aggregate group. The aggregate fact table 340 ' . * t u \ m ■ u ♦ . 

• 1 j * cZl\ * i j cleanse value. The cleanse character ID is a character value 

includes an aggregate tact key, an aggregate group key, and - ™ , . A 

c 4 4 i-i i rfL * * * i • *iT • i for indexing into this mapping group. The cleanse integer ID 

a fact table key. The aggregate fact key is the primary key . ■ i * • j • • * *<• ■ - 

f t f ♦ -ru * ,1 1™, •„ *ul * n is a numeric value for indexing into this mapping group. The 

for the aggregate fact 340. The aggregate group key is the 40 , ■ t • i r fT f 

♦ u • j u 4U JL ; f , n cleanse map definition key is the primary key for the cleanse 

aggregate group being defined by the aggregate fact. The , - . . r« , 7 , • . 

fact table key points fo the fact table that is being made a ma ? cleanse ma P ke J * the ** «° 

member of the rou which this particular cleanse map definition belongs. And 

m ^nu r ° e S^yP- ^ cleanse map value is the translation value after the 

The aggregate dimension 344 indicates the membership . ^ 

of a dimension within a constellation in an aggregate group. 45 ma PP 10 S* 

The aggregate dimension 344 includes an aggregate dimen- Additional Tables 

sion key, an aggregate dimension type, an aggregate group The metacolumn 334 is a column that occurs by default in 

key, a dimension role key, and a special dimension base key. tables in the datamart 150. The metacolumn 334 includes an 

The aggregate dimension key is the primary key for the actual table type, a list order, a metacolumn key, a metacol- 

aggregate dimension 344. The aggregate dimension type 50 unm name > 311(1 a physical type. The actual table type 

indicates the manner in which this dimension (special or indicates the type of physical table in which this special 

role) will be included in the aggregate group. The aggregate column should appear. The list order is the order this column 

group key indicates the aggregate group being defined. The occurs m ^les of the appropriate type. The metacolumn 

dimension role key points to the dimension role being key is the primary key. The metacolumn name is the physical 

included. It is possible that this key is null. The special 55 name of the column when it is used. The physical type is the 

dimension base key indicates the special dimension being logical data type for this column. 

included. The special dimension base key can also be null. The actual table type 336 is a look up table for actual table 

The aggregate key operator 392 is defined as described in types. Actual table types can be fact, dimension stage, fact 

the dimension base 306. stage, dimension map, or dimension. 

The fact aggregate operator 312 is a look up table of valid 60 The aggregate group 342, the aggregate fact 340, the 

fact aggregation operations. Each operator is an SQL opera- aggregate dimension set 372, the dimension column set 370, 

tor used to aggregate a fact column. the dimension column set definition 374, the fact index 380, 

A special dimension base 391 provides details about the fact index definition 384, and the fact index number 382 

special built-in dimensions in the system 100. The special are for future use and are therefore optional. Each of these 

dimension base includes an "always include an aggregate" 65 tables provides greater flexibility when defining the meta- 

field, a default aggregate dimension type, a dimension base data 160, improves the performance of the system 100, or 

key, a list order in fact, a physical type of key, an index flag, may otherwise enhances the system 100. 
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Top Level Metadata Use 

Now that all of the elements in FIG. 3 have been listed and 
described, their relationships and workings are now 
described. 

It is important to note that many of the tables in FIG. 3 are 
actually used in providing layers of abstraction to allow for 
the reuse of information and non-abstract tables. Therefore, 
a consultant will often only deal with only some of the tables 
in the FIG. 3. For the purposes of describing how the 
metadata 160 can be used to define a schema for the 
datamart 150, these grouping and levels of abstraction tables 
will be described where appropriate. 

Generally, a consultant will create a new datamart 150 by 
defining instances of the dimension bases 306, and constel- 
lations 302. Each instance corresponds to a row in the 
dimensions bases 306 table or the constellation 302 table. 
The constellation instances are defined by defining 
aggregates, dimensions, facts, measures, and ticksheets. The 
following describes the definition of a schema using the 
metadata 160. This corresponds to block 210 of FIG. 2. 

Beginning with the facts in a constellation, the consultant 
defines a fact table 304 row that will define the hub table in 
a star schema supported by the constellation. Again, it is 
important to remember that the fact tables in FIG. 3 are for 
definitional purposes, and are not the real fact tables in the 
datamart 150. A row in the fact column 310 holds the details 
of what columns will be created for place holders of actual 
values in a corresponding fact table. Thus, for each fact, the 
consultant defines the various fact columns. 

Once the facts have been defined, the consultant can then 
define the dimensions of the constellation. 

Remember that the dimension base 306 holds the infor- 
mation to define the actual dimensions of the tables in the 
datamart 150. The dimension role 320 allows for the reuse 
of the dimension base tables. Thus, different dimension roles 
can refer to the same dimension base. This provides an 
important feature of some embodiments of the invention 
where the same dimension bases can be used in multiple 
constellations or within the same constellation. The dimen- 
sion columns 329 define the columns on which queries can 
be performed in the datamart 150. The dimension node table 
326 helps relate the dimension columns 329. Thus, the 
consultant will have defined the basic schema for the data- 
mart 150. 

The aggregate group 342 defines how particular facts or 
dimensions are to be aggregated by the aggregate builder 
170. These aggregated facts provide much faster queries in 
the datamart 150. 

The cleansing map tables are for scrambling the data in 
the datamart 150 for presentations to people who want to see 
the functionality of the system 100, without having to reveal 
the actual data in the datamart 150. 

The special dimensions are the transaction type table and 
date values that are included in every fact table. Because this 
is included in every fact table, the system 100 can rely on the 
existence of the transaction type during the various stages of 
datamart 150 creation, modification, querying, and the like. 

Thus, the elements of FIG. 3 can be used to allow the 
consultant to define the schema definitions 161 for creating 
the tables in the datamart 150. 

Extraction Metadata 

The following describes the metadata 160 used in the 
extraction process 204. This metadata, represented as extrac- 
tion schema 400, is shown in FIG. 4. The extraction process 
focuses around the job and connector tables. In general, 
these tables define the various steps in extracting the source 
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system data into the staging tables 130 and performing the 
desired semantic conversions on that data. 
Extraction Metadata List 

FIG. 4 includes the following elements: a job 402, a job 

5 step 404, a system call 405, a connector 406, a connector 
time stamp 407, a connector step 408, a connector column 
latch 409, and an extraction group 411, an extraction note 
410, an SQL statement 420, and error handling type 413, an 
external table 422, an external column 424, the physical type 

10 330, the fact table 304, a debug level 415, the semantic 
instance 308, a semantic type 430, a dimension semantic 
type 432, a fact semantic type 434, the actual table type 336, 
a semantic type definition 436, an adaptive template 438, 
and adaptive template block 439, the dimension base 306, a 

15 job log 401, a connector store role 448, a store role 446, a 
statement type enabled 428, a store role allow 444, a data 
store 440, a source system 442, a file store 441, and Oracle 
store 454, a store version 452, and SQLServer store 456, and 
ODBC store 458, and a store type 450. 

20 Extraction Metadata Descriptions 
Job Related Tables 

The job 402 is a top level object for controlling the work 
flow during the extraction and loading process 204. The job 
402 includes a check databases field, a check tables field, a 

25 description, a label, an initial load flag, a job key, a job name, 
a log file width, a mail to on error, a mail to on success, and 
a truncate flag. The check databases field indicates whether 
or not an attempt should be made to log into all the data 
stores before executing the job. The check tables flag 

30 indicates whether or not to check for the existence of all the 
tables in the datamart 150 before executing the job. The 
description is for documenting the job (usually done by the 
consultant). The enabled flag indicates whether or not a 
particular job can be run. The initial load flag indicates 

35 whether or not to ignore all previous time stamped con- 
straints when running a particular job. The job key is the 
primary key for the job table 402. The job name is the 
internal name of the job. The log file width indicates how 
many characters wide to make rows in the log file output. 

40 The mail to on error, and the mail to on success indicate 
where E-mail messages should be sent after failure or 
success of the particular job. The truncate flag indicates 
whether or not to truncate any tables when running a job. 
The job log 401 is the locations where the running job is 

45 logged. That is, the location of the output that will be 
provided to the consultant indicating what occurred during 
the extraction (e.g., what errors occurred). The job log 401 
includes a data store key, a job key, a job log key, and a job 
store role. The data store key indicates the data store having 

50 a role defined within the job. The job key is a reference to 
the particular job, the job log key is the primary key for the 
job log 401. The job store role is the role being assigned to 
that particular job log 401. An example job store role is 
"<working directory>," indicating the path to the working 

55 director where job log files are store. 

The job step table 404 includes the detailed steps that 
make up a job. This includes connectors and system calls. 
The job step table 404 includes the following attributes: a 
connector key, a description, an enabled flag, a job key, a job 

60 step key, a list order, a phase, a job step type, and a system 
call key. The connector key indicates the connector being 
included in any particular job step. The connector key can be 
null. The description is for documenting the job step. The 
enabled flag indicates whether or not a particular step is 

65 enabled. The job key points to the job being defined by the 
job step 404. The job step key is the primary key. The list 
order indicates the order of a particular job step. The phase 
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also indicates the order of a particular step. By supporting 
both list order and phase, alternative embodiments of the 
invention can support parallel extraction. Steps in the same 
phase can then be executed simultaneously. The job step 
type indicates the type of the job step (which are defined in 5 
the job step type table 481). The system call key points to a 
system call included in a particular job step. The system call 
key can be null. 

The system call 405 is a table including external OS 
system calls. These external system calls can be used to 10 
perform any number of external system functions. The 
system call 405 includes the system call key, a command 
string, a description, a name, and an on-error type. The 
system call key is the primary key for the system call. The 
command string is the actual operating system command to 15 
be run as a result of the system call. The description is for 
documenting the system call. The name is the logical name 
of the system call being defined. The on error type is an 
indicator to point to what to do if the system call fails. 
Connector Related Tables 20 
The connector 406 defines a name and a description. The 
connector 406 is a grouping mechanism for extraction 
statements and a specification for input and output data 
stores. The description is used for documenting the connec- 
tor. The name is the logical name of the connector. The 25 
connector 406 represents an ordered collection of connector 
steps 408. 

The connector step 408 defines steps within a connector. 
The connector step table 408 includes the following 
attributes: a connector key, a connector step key, an enabled 30 
flag, an extraction node key, a list order, and a phase. The 
connector key points to the connector being defined. The 
connector step key is the primary key. The enabled flag 
indicates whether a particular connector step is enabled. The 
extraction node key points to an extraction node that is, the 35 
extraction group of statements and semantics that make up 
this connector step. The list order is the order of steps in the 
connector. The phase is also the order of the steps in the 
connector. 

The connector time stamp 407 relates to information 40 
about incremental extraction. An incremental extraction is 
where increments of the data in the source system 110 are 
extracted The connector time stamp includes a connector 
key, a connector time stamp key, current max date, a current 
max time stamp, a last max date, and a last max time stamp. 45 
The connector key points to the connector to which the 
connector time stamp applies. The connector time stamp key 
is a primary key. The current max date is an indicator of the 
proposed new system date of the last successful extraction. 
The current maximum time stamp is the proposed new SQL 50 
server time stamp field for the last successful extraction. The 
last maximum date is the system date of the last successful 
extraction. The last maximum time stamp is the SQL server 
time stamp field for the source system databases at the last 
successful extraction. 55 

The connector time stamp 407 is particularly useful when 
only updated data should be pulled from the source systems 
110. 

The connector column latch 409 defines information 
about incremental extraction based on a database column. 60 
The incremental extraction information is thus kept in the 
database and can be retrieved. The connector column latch 
409 includes the following attributes: a column name, a 
connector column latch key, a connector key, a current 
maximum value, a last maximum value, and a table name. 65 
The table name is the name in the input data store for the 
corresponding connector. The column name is the column 



name within that table. The connector column latch key is 
the primary key. The connector key points to the connector 
to which this latch applies. The current max value represents 
the proposed new maximum value for the incremental 
extraction. This number is pushed into the last maximum 
value if the currently executing extraction succeeds. The last 
maximum value is the maximum value that was extracted 
during the last run of the extraction. 

The connector store role 448 defines the usage of a data 
store for a particular connector. It indicates whether the data 
store is input or output. The connector store role points to the 
E lo connector and the data store. The connector store role 
also indicates the type of storage usage being defined for this 
connector (input or output). 

Extraction Group Related Tables 

The extraction group 411 defines a group of extraction 
steps. The extraction group 411 includes a description, and 
a name of the set of extraction steps. 

The extraction node 410 is a single node, or step, in the 
extraction tree. The extraction tree defines the order of 
extraction steps. An extraction node includes an extraction 
node type, a debug level, a debug level row, an enabled flag, 
an extraction group key, an extraction node key, a list order, 
and on error type, a parent extraction node key, and a phase. 
The extraction node type defines the type of extraction node 
(as defined in the extraction node type table 491). This 
relationship is important and allows for the conversion of 
data in the staging tables for use in the datamart 150. The 
debug level indicates how to debug a particular step during 
execution. The debug level row indicates which row to start 
debugging at for SQL statements. The enabled flag indicates 
whether or not to execute a particular SQL statement asso- 
ciated with the extraction node. The extraction group key 
points to, if not null, the name of the group. The extraction 
node key is a primary key. The list order and the phase define 
the order of the corresponding step within an extraction 
node's parent. The on error type indicates what to do if there 
is an error during the execution of the step associated with 
the extraction node. The parent extraction node key points to 
the parent extraction node of the present extraction node. 

SQL Statement Related Tables 

The SQL statement 420 defines a single step in an 
extraction run. A row in the SQL statement 420 table 
represents an SQL statement. The columns in the SQL 
statements match those in the corresponding dimension base 
definition or fact table definition. In one embodiment the 
consultant supplies the SQL source in SQL statements. 

The SQL statement 420 includes an extraction node key, 
a description, a dimension base key, and execute against 
input flag, an external table key, a fact table key, SQL source 
and SQL statement key, and an SQL statement name. The 
extraction node key points to the extraction node associated 
with a particular SQL statement 420. The description is for 
documenting the SQL. The dimension base key points to the 
dimension base for the corresponding SQL statement. The 
execution against input flag indicates whether or not to 
execute this SQL statement against the source or destination 
data store of the connector that is calling this SQL statement. 
The external table key points to the external table, if any, 
being extracted into. The fact table key points to the fact 
table, if any being extracted into. The SQL source is the 
actual SQL source to be executed during the SQL statement 
execution. The SQL statement key is the primary key. The 
SQL statement name is the logical name for ah extraction 
SQL statement. 

The statement type enabled 428 defines which RDBMS 
types use this extraction statement. The statement type 
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enabled 428 includes the following attributes. An SQL 
statement key, a statement type enabled key, a store role 
name, and a store type. The SQL statement key points to the 
SQL statement. The statement type enabled key is the 
primary key. The store role name is the role name for which 
this SQL statement should be used. The store type is the 
store type for which this statement should be executed. 

The external table 422 defines the user defined destination 
table for use during a multiphase extraction. This user 
defined destination table can be used to temporarily store 
data during an extraction. The external table 422 defines the 
physical name of the external table and whether or not to 
truncate this external table on the next job run. 

The external column 424 defines a column in a user 
defined extraction table. The external column 424 includes 
the attributes for documenting a particular external column, 
and the column name in the external table. A pointer to the 
external table, a list order of appearance in the external table, 
and a physical type of the logical database for this column 
are included in the external column 424. 

Error Handling 

The error handling type 413 is a look up table to define 
how to respond to a particular error. An error handling type 
can be a default action to take when an error occurs. 

The debug level 415 defines ways in which extraction 
steps can be debugged. The debug level includes a logical 
name for users to pick a debugging behavior. The default 
level also includes a default flag. 

Data Semantic Related Tables 

The semantic type 430 defines a set of predetermined 
semantic types for use in defining a schema. The semantic 
type includes a logical name for a particular transformation. 
Associated with the semantic type are a dimension semantic 
type 432 and a fact semantic type 434. The dimension 
semantic type table 432 defines the ways in which dimen- 
sion data in the staging tables 130 can be extracted and put 
into the datamart 150. Similarly, the fact semantic type 
defines the ways in which the information in the staging 
tables 130 can be put into the fact tables of the datamart 150. 
Both the fact semantic type 434 and the dimension semantic 
type 432 include pointers to an actual table type and are used 
to subset the full list of semantic types. 

Each semantic type 430 is made up of a semantic type 
definition. The semantic type definition table 436 defines the 
set of adaptive templates used in any given semantic type. 
The semantic type definition 436 includes the semantic type 
key that points to the semantic type 430 for a particular 
semantic type definition. The semantic type definition also 
points to the adaptive template*438 used. The semantic type 
definition also includes a list order for the ordering of the 
adaptive templates. 

The adaptive template 438 is a semantic transformation 
template (e.g., an SQL program) that is used in the extraction 
of the data in the staging tables 130 to turn all source data 
into transactional data. The adaptive template 438 includes 
attributes indicating a logical name for an adaptive program 
used within semantic transformations. 

The adaptive template block 439 defines the individual 
pseudo-SQL statements that make up a template. The adap- 
tive template block 439 has the following attributes: the 
adaptive template block key, an adaptive template key, a 
block name, a list order, an on error type, and SQL source. 
The adaptive template block key is the primary key. The 
adaptive template points to the adaptive template to which 
this block belongs. The block name is the internal logical 
name for this block of pseudo-SQL source code. The list 
order is the order of this block within the template. The on 
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error type indicates what to do if this block causes an SQL 
error when executed. The SQL source includes a template of 
pseudo SQL source code. This template is described in 
greater detail below. 

5 Data Store Related Tables 

The store type 450 defines the types of RDBMS's sup- 
ported by the system 100. 

The data store 440 defines a logical data source, or sink, 
used during the extraction. The data store 440 includes the 

to following attributes: the data store key, a data store flag, a 
description, a name, a source system key, and a store type. 
The data store key is the primary key. The datamart flag 
indicates whether or not this data store is the special data- 
mart store. Since the datamart 150 and the metadata 160 can 

15 reside in the same database or different, the data mark helps 
resolve the location of the datamart 150. The description is 
for documentation of the particular data store. The name is 
the logical name of the data store. The source system key 
points to the source system identifier to which this data store 

20 belongs. This allows five, and backup, source systems to 
share the same identifier. The store type indicates the store 
type of this data store. 

The source system 442 is a logical identifier for a source 
system 110 from which data can be pulled. This allows two 

25 physical databases to act as one master database and one 
backup, for example. The source system 442 includes a 
description attribute, a source system key and a source 
system name. The description is for documentation to 
describe the source system. The source system key is a 

30 primary key for this table. This number also becomes 
identified source system field in the staging tables 130 being 
filled. The source system names is a logical name for a 
source system 110 from which the system 100 is pulling 
data. 

35 The following store tables are subtypes of the data store 
table 440. They address specific data stores. 

The file store table 441 defines the files in which data can 
be stored. The file store 441 defines a directory and file name 
for each file. 

40 The Oracle store 454 defines information about particular 
Oracle databases. The Oracle store 454 includes the follow- 
ing attributes: a data store key, an instance name, an Oracle 
store key, a password, an SQL network name, a user name, 
and a version. Hie data store key is a one to one relationship 

45 key to the data store being defined. The Oracle store key is 
the primary key. The password and user name are used to 
access a specific Oracle system. The version number is the 
Oracle vendor version number. The SQL network name is 
the SQL net instance name. The store version is a version of 

50 the store type (the database vendor's version for example) 
that the system recognizes. The store version has a pointer 
to the store type being defined and also includes a version 
number attribute. 
The SQL server store table 456 defines details about an 

55 SQL server system. The SQL server store includes the 
following attributes: a data store key, a database name, a 
password, a server, and SQL server store key, a user name, 
and a version. The data store key is a one to one relationship 
to a data store entry. The database name is an SQL server 

60 database name (SSDEFAULT means the database in which 
this role resides). The password is the SQLServer password. 
$$ DEFAULT again means the password currendy logged 
into to read this data. The server is the SQLServer name. The 
SQL server store key is the primary key. The user name is 

65 the SQL server user name. The version is the vendor's 
version number of this SQL server. SSDEFAULT means use 
the default value for the current database being used. For 
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example, the database name means the database in which 
this role resides. 
Extraction Metadata Use 

The following describes the tables of FIG. 4 io the context 
of the extraction process 204. The job, the job step, and the 
connector, group extraction steps for extracting information 
from the source systems 110 and cause that information to 
be placed in the datamart 150. This organization allows for 
a very flexible extraction process. For example, where a two 
phase extraction is required, one connector could be used to 
extract the information from the source system 110, while a 
second connector could then be used to take this extracted 
data from an external table. 

The job defines the order of the execution of the connec- 
tors. The job also allows for the running of an external 
program, such as system call, between connector executions. 
Thus each, job step in a job can be a system call or a 
connector. 

The following is an example illustrating the organization 
of job. Assume that a consultant wants to extract information 
from a source system that provided a raw set of home 
addresses. A system call could be run as part of a job step. 
The system call would determine the zip codes associated 
with those addresses. The zip codes could then be included 
in the datamart 150. 

The relationship between the connector 406, the connec- 
tor step 408, and the extraction node 410 is that the con- 
nector 406 allows for the reuse of extraction nodes in 
multiple connectors (through the connector step). This rela- 
tionship is particularly important where similar connectors 
are created. For example, assume that a consultant wants to 
create a connector that runs some steps Monday through 
Friday and different steps on Saturday and Sunday. Most of 
the steps in the connector will be common, however some 
will be different. Through the use of the connector step, the 
consultant can reuse many of these connector steps in each 
connector. 

As noted before an extraction node can be a leaf or a 
grouping mechanism for extraction. It can correspond to an 
SQL statement extraction step or a semantic conversion step. 
During the definition of the schema, the consultant defines 
the specific SQL statements for extraction and the specific 
semantic instances for the facts and dimensions. 

An SQL statement is a single step in an extraction run that 
represents a data push or a data pull. The SQL source code 
dictates the action for a given extraction node. After the SQL 
statements are run, the staging tables 130 are ready. The 
semantic conversion of the data in the staging tables 130 can 
occur. 

Tlie semantic instance represents the use of a single 
generic template on one fact or dimension table. The seman- 
tic type associated with the semantic instance is one of a 
number of pre-defined recognized data meanings within the 
system 100 (e.g., an "order"). The semantic types corre- 
spond to programs for converting the data in the staging 
tables 130 into data for use in the datamart 150. An example 
of a semantic type is a "slowly changing dimension" type. 

The semantics types, as mention previously, are made up 
of a series of templates. These templates include tokens that 
can be replaced with information from the corresponding 
dimension base or fact table. An example of an adaptive 
template is one that would be used in re-indexing of a fact 
table. This could be used as the last step in the semantic 
transformation of facts. The re-indexing will help speed the 
operation of the datamart 150. Importantly, this same index- 
ing is performed for each fact table. No matter which 
semantic type is chosen for a given fact table, the same 
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indexing is performed. Thus, this adaptive template can be 
used in each semantic type through its semantic type defi- 
nition. 

The following describes the slowly changing dimension 

5 semantic type. (See Appendix A.) In this semantic type are 
an insert dimension and an index dimension adaptive tem- 
plate. Each adaptive template has a corresponding set of 
pseudo-SQL statements. During the semantic template con- 
version 140 this pseudo-SQL will be transformed into real 

io SQL source code. This is done by converting the pseudo- 
SQL tokens into actual dimension column names, etc. (the 
column names and table names are derived from the schema 
definitions 161). 

Thus, during the extraction, the extraction node associated 

IS with a particular semantic type instance is processed. This 
causes the adaptive template blocks associated with the 
semantic instance to be processed. The dimension informa- 
tion associated with that semantic instance, or the fact table 
information associated with the semantic instances, can then 

20 be used to replace the tokens with the specific information 
associated with that dimension or that fact. 

Some embodiments of the invention correspond to only 
one or more semantic templates and a computer readable 
media, a computer, an electromagnetic waveform, or the 

25 like. 

Further Discussion of Templates 

The following describes the pre-parsed template and the 
post-parsed results in greater detail. Each token begins with 
a $$. In the example template, for the slowly changing 

30 dimension semantic type, a number of tokens begin with $$ 
DIM KEY. Similarly, tokens appear that begin with 
$$FSTGTBL[]. In the post-parsed template, the dimension 
key tokens have been changed to cost center keys, account 
key, subaccount keys, etc. Note any tokens, and their sur- 

35 rounding text, that are not replaced are removed from the 
post-parsed text If more tokens need to be replaced then are 
available in the template, then an error flag will be set. In 
other embodiments of the invention, the templates are 
dynamically generated according to the number of columns 

40 defined in the schema definition 161. In other embodiments, 
templates are not used but the "post-parsed SQL" results are 
dynamically generated from the schema definitions 161 and 
the semantic instance types. 

In this example, the net price corresponds to a fact column 

45 in a fact table. This indicates that the table entitled SSA in 
the post-parsed example includes one fact called net price. 

In some embodiments, the pre-parsed templates include 
additional tokens to deal with specific data stores. For 
example, the "select into" statement is a token in the 

50 pre-parsed version. This compensates for whether the data 
store is in Oracle database or an SQL server. 

Another feature of the pre-parsed language is that 
" — #begin#*' is used to break the pre-parsed version into 
adaptive template blocks. 

55 Examples 

Appendix A illustrates semantic types that may be sup- 
ported and their corresponding adaptive template names. For 
example, the Pipelined semantic type is made up of, in this 
order, the map_Jeeys the pipe_state and the index_fact 

60 adaptive templates. The example pre-parsed and post parsed 
SQL adaptive templates are then provided. 

As mentioned previously, the use of the semantic types 
significantly reduces the amount of work needed to imple- 
ment the datamart 150. By selecting a semantic type for a 

65 particular fact table or dimension table, the consultant auto- 
matically selects the corresponding pre-parsed SQL adaptive 
templates. The selected adaptive templates are then auto- 
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matically converted into post-parsed SQL statements that 
include the schema specific information for the data mart 
150. Additionally, these post-parsed SQL statements include 
the SQL for converting the data in the staging tables 130 into 
data that can be used in the datamart 150 tables. 
Additional Templates 

Two types of templates not described in Appendix A are 
"team" templates and "denormalized" templates. 

The team template is used to property populate an "asso- 
ciative'* dimension table. Such a table is used whenever 
there is a one-to-many relationship between an individual 
fact row and a dimension. For example, if multiple sales- 
people can split the credit for an order, one needs some way 
to represent this situation in the datamart. In a star schema, 
one normally associates a tuple of dimension values with a 
fact row (e.g. product, customer, salesrep dimensions for the 
fact row containing price, quantity etc.). Since there is only 
a single salesrep_key, one could normally have only one 
salesrep associated with this transaction. There are two 
solutions. One is to introduce multiple fact rows for a 
transaction invovling one to many relationships. If there 
were three salesreps on a specific order, there would be three 
fact rows for this order stored in the database. This has the 
disadvantage of multiplying the data size by a factor of three 
and slows queries. Also queries that are concerned with the 
total number of transactions become more difficult to pro- 
cess since duplicate rows, due to the multiplication by the 
number of salesreps, must be eliminated. 

Another solution is to introduce an associative table 
between the actual salesrep dimension table and the fact 
table. Conceptually, the associative table contains "teams**, 
of salespeople. If salesreps A, B and C often sell products 
together, they will be associated with a unique team key. The 
team key will be stored in each fact row for orders sold by 
the A, B, C team. The associative table will associate the 
team key with the three rows for A, B and C in the salesrep 
table. The associative table will have 3 rows representing 
this team (A-key, team 1 -key), (B-key, teaml-key) and 
(C-key, teaml-key). If the team of A, B, D and Q also sold 
products together, the associative table would have four 
additional rows (A-key, team2-key), (B-key, team2-key), 
(D-key, team2-key), (Q-key, team2-key). The team template 
scans the staging table used to load the fact table and 
generates the appropriate rows for entry into the associative 
table, only for those teams THAT ACTUALLY OCCUR in 
the fact rows being loaded. 

Also, if a team is already present in the associative table 
it will be reused. 

In real world situations, the number of teams that actually 
occur is much smaller than the total space of all possible 
teams. 

Note that this team template can be used wherever there 
is a one to many relationship between fact and dimension 
rows. 

Another example is in a Sales Force Automation system 
where the fact rows correspond to a sales "opportunity". 

An opportunity may be associated with the dimensions of 
Sales Lead Source, Product and Customer Contact. All of 
these may be one to many relations, amenable to the "team" 
concept. 

As mentioned above, the other type of template is the 
denormalized data template. This is a variant of the 

"Team** template where instead of introducing the extra 
associative table between the dimension and fact tables, the 
dimension table is a combination of the associative table and 
the actual dimension table. This effectively flattens the data. 
In the above example the dimension table would contain 
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rows like ("Greg Walsh", A-key, teaml-key), ("Craig", 
B-key, teaml-key), ("Ben", C-key" teaml-key), ("Greg 
Walsh", A-key", team2-key) etc. Greg Walsh is a member of 
both teams 1 and 2 and his name (and other attributes) rather 

5 than just his key (A-key) is stored twice. Used judiciously 
this can result in faster queries than the associative table case 
but results in duplicate data being stored. 

The population of both the denormalized team dimension 
and the associative table are difficult to code properly. This 
is especially true if this is done incrementally (e.g., on 
nightly extracts) and if you want to be independent of team 
order (e.g. A, B, Q is the same as (A, C, B). Thus, allowing 
the consultant to simply select this data semantic provides a 
significant improvement over previous systems. 
Runtime Metadata 

15 FIG. 5 illustrates the schema for the runtime environment 
within the system 100. The runtime schema 500 represents 
the schema description for the schema of the running 
datamart 150. That is, when the datamart 150 is created or 
modified, the schema definition is propagated into the runt- 

20 ime schema 500. Thus, the runtime schema 500 allows for 
the datamart 150 to be changed without having to rebuild all 
the tables and repopulate all of those tables. Additionally, the 
runtime metadata 500 provides the support for aggregate 
navigation. Aggregate navigation involves determining 

25 which aggregate to use in response to a query. Schema 
modification and aggregate navigation will now be 
explained in greater detail. 

The schema modification involves comparing the 
changed schema definition with the present schema defini- 

30 tion. As will be seen below, an actual table 502 keeps track 
of all of the dimension tables and the fact tables in the 
datamart 150. When a change is made to the schema 
definition, a comparison is made between the old definition 
and the new definition. The difference between these defi- 

35 nitions defines the set of tables, columns, and rows that need 
to be added, deleted or modified, in some way. Importandy, 
the modifications can often be made without losing any data 
in the datamart 150. 

The aggregate navigation process determines which 

40 aggregate most closely suits a particular query. The runtime 
metadata 160 keeps track of the aggregates available in the 
datamart 150. The query and reporting program 104 initiates 
a view of the runtime metadata 500 (in particular, the tables 
holding the aggregate tables definitions). The view results 

45 indicate which aggregates are available to answer the par- 
ticular query. The view results are further examined to 
determine the best aggregate to use (the one that most 
closely corresponds to the query). 

Importantly, the query machinery does not need to be 

50 aware of aggregates to be able to take advantage of them. 
Aggregates are simply presented to the query machinery as 
a solution to a query. 

Additionally, aggregates can use other aggregates to build 
themselves. This is because the schema definition can be 

55 used to determine the relationship between aggregates. 
Runtime Metadata List 

The runtime schema 500 includes the following elements: 
an actual table 502, an actual column 504, a fact aggregate 
table 512, a fact aggregate dimension 514, a dimension base 

60 aggregate 516, a dimension base aggregate column 518, a 
datamart letter 510, the dimension base 306, the fact table 
304, the external table 422, an actual column 504, a physical 
type definition 530, an actual table type 336, an actual 
column type 540, the physical type 330, a database physical 

65 type 595, the translation string 332, a translation actual 539, 
a store type 450, a date 560, a business process 570, an 
adaptive template profile 580, and a transaction type 590. 
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Runtime Metadata Descriptions 

The actual table 502 corresponds to metadata 160 that 
describe which dimension base and fact tables "actually" 
exist in the datamart 150. 

The actual table 502 includes the following attributes: an 
actual table key, an actual table name, an actual table type, 
a dimension base key, an external table key, a fact table key, 
an index flag, a mirror flag, and a logical table name. The 
primary key is the actual table key. The actual table name 
corresponds to the physical name of this table in the database 
implementing the datamart 150. The actual table type is the 
logical type of this physical table. For example, if this is a 
dimension staging table or a fact staging table. The dimen- 
sion base key points to the dimension base table definition 
that defined the corresponding physical table. The external 
table key points to the external table definition that defined 
the physical table. The fact table key points to the fact table 
definition that defined the corresponding physical table. The 
index flag and the mirror flag are used in indexing and 
mirroring, respectively. The logical table name defines the 
logical name for this table. 

The actual column 504 is metadata describing a physical 
column in a physical table in the datamart 150. The actual 
column table latches this definition information when the 
physical tables are built in the datamart 150. The actual 
column 504 includes the following attributes: the actual 
column key, an actual column name, an actual column type, 
an actual table key, a dimension role name, a foreign table 
key, a group by field, a hierarchy, a list order, a parent 
hierarchy, a physical type, a primary key, and a time navi- 
gation field. The actual column name is the name of the 
physical column in the physical table in the datamart 150. 
The actual column type is the logical type of the column. 
The actual table key points to the actual table in which the 
actual column fives. The dimension role name is the logical 
role name of the dimension in the fact table for dimension 
foreign keys inside of a fact table. The foreign table key 
points to the actual dimension base tables in the actual tables 
502 (the foreign table key is applicable to fact actual 
columns that are foreign keys to dimensions). The group by 
field, for dimension table, is true when this column should 
be included in an aggregate builder group. The hierarchy for 
dimension, for dimension columns, indicates that aggregate 
builder group to which this column belongs. The list order 
is the order of the column in the actual table 502. The parent 
hierarchy, for dimension columns, indicates the parent 
aggregate builder group to which this column belongs. The 
physical type is a logical data type of the column. The 
primary key, for dimension tables, is true when this column 
is the primary key of the actual table 502. The time navi- 
gation field, for the database dimension, is true if this field 
can be used by the time navigator. 

The fact aggregate table 512 includes a list of fact 
aggregates in the datamart 150. The fact aggregates includes 
attributes that point to the actual fact table in which this 
aggregate belongs. The fact aggregate table 512 indicates 
which numbered aggregate represents the fact table in 
question, the number of rows in this aggregate, a datamart 
letter, and an enabled flag. The datamart letter indicates the 
mirrored datamart to which this fact aggregate belongs. 

Mirror is used to ensure that partially completed extrac- 
tions from the source systems 110 do not cause the database 
to become inconsistent. 

The fact aggregate dimension 514 lists which aggregates 
contain which dimensions. The fact aggregate dimension 
includes the following attributes: an actual dimension role 
key, a dimension base aggregate key, a fact aggregate 
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dimension key, and the fact aggregate key. The actual 
dimension role key is the dimension foreign key in this fact 
aggregate that is being defined. The dimension base aggre- 
gate key is the dimension aggregate that this fact aggregate 

5 points to for this foreign key. The fact aggregate dimension 
key is the primary key. The fact aggregate key points to the 
fact aggregate being defined. 

The dimension base aggregate table 516 lists all the 
dimension aggregates in the datamart. The dimension base 

10 aggregate includes the following attributes: an actual table 
key, an aggregate number, an aggregate size, a datamart 
letter, a dimension base aggregate key, and an enable flag. 
The actual table key points to the physical header for this 
dimension base. The aggregate number, for the dimension 

15 table in question, is the number of this particular aggregate. 
The aggregate size is the number of rows in the aggregate. 
Hie datamart letter indicates which mirrored database this 
aggregate lives in. The dimension base aggregate key is the 
primary key. The enable flag indicates whether or not the 

20 aggregate navigator should work with this aggregate. 

The dimension base aggregate column 518 is a list of 
columns in a given dimension aggregate. The dimension 
base aggregate column includes attributes which point to 
which column is included in this dimension aggregate, and 

25 a pointer to a dimension aggregate being defined. 

The datamart letter 510 indicates which of two mirrored 
datamarts a particular aggregate belongs to. This is an 
optional element which may not be required if mirroring 
does not occur in the datamart 150. Mirroring duplicates the 

30 tables in the datamart 150. Changes can then be made to one 
copy of the datamart 150, while the other datamart 150 
continues running. These changes can then be propagated 
when possible. 

The actual column type 540 is a logical description of role 

35 a column plays in the system 100. The actual column type 
540 includes attributes that define the default value to be 
used in a database for a column of this type. 

The physical type definition 530 defines which physical 
types are allowed for which table types. The physical type 

40 definition 530 includes attributes which point to an actual 
table type. The actual table type is a logical type of a 
physical table (for example, dimension, fact etc.) being 
defined. The physical type definition also includes an 
attribute that indicates whether to select this item by default 

45 when giving the consultant or user a choice. 

The database physical type 595 defines the name of the 
physical database. 

The translation actual table 539 defines the actual values 
of translations strings for a single relational database man- 

50 agement system. These translations strings are the real 
strings to use for a given translation string within a store 
type. The translation actual table 539 also includes attributes 
that point to the store type. 

FIG. 5 also illustrates additional tables used in the system 

55 100. 

The date table 560 is used to track date information in the 
datamart 150. Importantly, times and dates are always 
treated corrected in the datamart 150. This can be guaranteed 
because the consultant cannot change the definition of dates 

60 in the datamart 150. Thus, for example, the month of 
September will always have 30 days, and leap years will be 
handled correctly. 

The transaction type table 590 is a list of the available 
transaction types within the system 100. 

65 The adaptive template profile 580 is used as a commu- 
nications mechanism for templates. The adaptive template 
profile 580 includes a number of rows being communicated 
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back to the calling program. The adaptive template profile 
580 also indicates the logical name for information being 
communicated back from an adaptive template to the calling 
program. 

The business process table 570 is a look up table for 
supported business process types during the extraction. The 
business process 570 includes a business process key and a 
process name. The process name corresponds to a logical 
name for a business process to which fact staging table 
belongs. The process key identifies a business process record 
in a fact staging table. 

Time Navigation 

An important feature of some embodiments of the inven- 
tion is the ability to compactly store and rapidly query 
"historical" backlog/balance/inventory quantities. By his- 
torical we mean the amount of backlog or inventory as it 
existed at a given point in time — not necessarily today. Note 
that backlog/balance/ioventory quantities are different than 
transactional quantities. For example, your bank balance at 
the end of Ql 1997 is not the sum of your bank balances at 
the end of January, February and March. It is computed by 
adding all of the deposit transactions and subtracting all the 
withdrawals from the balance at the end of Q4 1996. One 
could compute balances by this method when a user queries 
the system but because this method requires rolling forward 
all of the appropriate transactions "from the beginning of 
time," the queries will likely be slow. 

The traditional solution in datamarts is to store periodic 
"snapshots" of the balance. The snapshots are often stored at 
daily intervals for the recent historical past, and at greater 
intervals (e.g. weekly or monthly) for less recent history. 
This approach has two big disadvantages. The first is an 
enormous multiplication of data volume. If, for example, 
you are keeping track of inventory in a store you must store 
a snapshot for each product you hold in inventory for each 
day, even if you only sell a small fraction of all of your 
products on a given day. If you sell 10,000 different products 
but you only have 500 transactions a day, the "snapshot" 
d at am art is 20 times larger than the transactional datamart. 
The second disadvantage relates to the most common solu- 
tion for alleviating the first problem, namely storing snap- 
shots at less frequent intervals for less recent history. This 
results in the inability to compare levels of inventory in 
corresponding time periods since the same level of detail is 
not present in earlier data. For example, in manufacturing 
companies it is often the case that much business is done 
near the end of fiscal quarters. If one wants to compare 
inventory levels between Ql 1995, Ql 1996 and Ql 1997, 
and focus on the most important changes which occur near 
quarter end, one cannot use the approach of storing the 
snapshots at coarser levels of detail since daily data would 
be required. 

In some embodiments of the system, the aggregate tables 
are used to answer queries about backlog/balance/inventory 
quantities. In order to answer such queries, the previously 
described rolling forward from the beginning of time is 
done. However, this is performed efficiently through the 
accessing of the appropriate time aggregates. For example, 
assume the datamart 150 has five years of historical trans- 
action data beginning in 1993. Assume that one desires the 
inventory of some specified products on May 10, 1996. This 
would be computed by querying all of the transactions in the 
1993, 1994 and 1995 year aggregates, the 1996 Ql quarter 
aggregate, the April 1996 month aggregate, the May 1996 
week 1 aggregate and finally 3 days of actual May, 1996 
daily transactions. These transactions (additions and sub- 
tractions from inventory) would be added to the known 
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starting inventory in order to produce the inventory on May 
10. Note that this "time navigation "hops" by successively 
smaller time intervals (year, quarter, month, week, day) in 
order to minimize the number of database accesses. What is 

5 important is the exploitation of aggregate tables, that already 
exist in the system in order to answer transactional queries 
rapidly (e.g. What were the total sales of product X in April 
1996?). This avoids the need to build what is essentially a 
second data datamart with the balance/inventory/backlog 

10 snapshots. 

Query Mechanism Metadata 

The following describes the metadata 160 used in the 
query/reporting program 104. This metadata is shown in 
FIG. 6. Generally, the query mechanism metadata can be 

IS broken into ticksheet metadata, measurement metadata, fil- 
tering metadata and display options metadata. The ticksheet 
metadata defines the user interface objects for user interac- 
tion with the datamart 150. The ticksheet defines how users 
can initiate queries and how results are presented back to the 

20 user. The measurement metadata defines a logical business 
calculation that can be presented to a user. Typically, the 
measurement metadata defines a format for presenting infor- 
mation to user that is more easily understood by the user or 
provides a more valuable result to the user. The filtering 

25 metadata defines how a user can filter results. Filtering 
allows the results set to be limited to particular dimension 
values. The display options metadata defines display options 
that can be provided to the user. 

The following describes some important features of the 

30 user interface. The user interface allows the user to drill 
down through data. Also, portions of the query forms can be 
dynamic based upon values in fields (e.g., a list box can be 
dynamically updated because it is tied to a field in the 
datamart 150, that when changed, cause the values in the List 

35 box to change). Also, a query is guaranteed to be consistent 
with the schema because the query is tied to the schema 
definition. 

Query Mechanism Schema List. 

FIG. 6 includes the following elements: the constellation 
40 302, a ticksheet 602, a data set 606, a ticksheet column 608, 
a tip 601, an attribute role 603, an attribute 61 0, a ticksheet 
attribute 605, a ticksheet type 604, a measure 620, a measure 
term 630, a measure unit 624, a term operator 632, a 
transaction type 590, an RPN operator 636, the fact column 
45 310, the fact table 304, a backlog 638, a measure mapping 
622, a ticksheet column element 612, a dictionary 640, a 
filter block 650, a filter block type 652, a filter group 654, a 
filter element 656, a ticksheet type option 660, an option 
location 662, an option value 664, an option name 666, an 
50 option display type 668, an application type 691, the dimen- 
sion role 320, the dimension column 329, and the dimension 
base 306. 

Query Mechanism Schema Metadata Descriptions 
Ticksheets Metadata 

55 Under the constellation 302, the ticksheet is a top level 
object for defining the user interface objects for user inter- 
action. The ticksheet 602 table includes a data set key, a 
name, a ticksheet type, a constellation key, a label, label 
detail, a list order, a cleanse flag, and a description. The 

60 cleanse flag indicates whether or not to cleanse the filter data 
on this ticksheet. The constellation flag indicates the con- 
stellation in which the dimensions and measures for this 
ticksheet reside. The data set key indicates the page in which 
the end user makes report selections. The data set key 

65 represents a logical grouping of similar ticksheets. The 
description is for documentation purposes. The label is the 
string for the name of the ticksheet. The list order indicates 
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the ticksheet list order. The name is the hidden name of the 
ticksheet. The ticksheet includes a primary key. The label 
detail allows for more verbose documentation. The ticksheet 
type indicates the type of application that interprets the 
selections made on this ticksheet. 

The data set table 606 is a grouping mechanism for 
ticksheets into sets that describe their contents. The data set 
table 606 includes the following columns: a data set key, a 
data set name, a label, a description, and a list order. The data 
set name is a logical name for top level user definition of like 
ticksheets across ticksheet types. The description is the 
description of the data set. 

The ticksheet column 608 defines a single column for 
displaying measure choices on a ticksheet. The ticksheet 
column table 608 includes the ticksheet column key, the list 
order, the ticksheet key, and the description. These columns 
and the ticksheet column table 608 operate in a manner 
similar to such columns in other tables in this metadata. 

The ticksheet column element 612 is a single value within 
a measure display column on a ticksheet. The ticksheet 
column element 612 includes an abbreviation, a description, 
a dictionary key, a label, a list order, a name, a ticksheet 
column element key, and a ticksheet column key. The 
ticksheet column element key is the primary key for this 
table. The ticksheet column key points to the ticksheet 
column 608 entry. The name is the hidden name for the 
column element. The abbreviation is the shortened name for 
the user display. The dictionary key is the key for help text 
entry for this element. It is a key into the dictionary 640. The 
other columns act in a manner similar to columns in other 
tables of similar names. 

The tip table 601 includes a ticksheet key, a name, a 
description, and a list order. The tip is a definition of user tips 
for using a particular ticksheet. 

The attribute table 610 defines the dimension attribute 
choices within a ticksheet. These choices are tied to a single 
dimension column in the schema definition of the datamart 
150. The attribute table 610 includes an abbreviation, an 
attribute key, a dictionary key, a dimension column key, a 
dimension role key, a hyperlink, a label, a list order, a name, 
and a ticksheet key. The abbreviation is the shortened user 
string for the attribute. The attribute key is the primary key 
for this attribute. The dictionary key is a pointer to the 
dictionary 640 that includes help message for a particular 
attribute. The dimension column key is the dimension col- 
umn in which this attribute refers. For degenerate dimen- 
sions this reference is null. The dimension usage, within a 
constellation, is defined by the dimension role key. The 
hyperlink is an html text for navigating return values for this 
attribute to other web sites, such as a company name 
look-ups etc. The label is what the user sees for a particular 
attribute. The list order defines a sort order on pop-up menus 
where one is the topmost in the list. The name is the internal 
name for the attribute. The ticksheet key indicates the 
ticksheet to which this attribute belongs. 

The attribute role 603 table is a lookup table list of valid 
roles for attributes within a ticksheet type. The attribute role 
includes the attribute role key and the ticksheet type. 

The ticksheet attribute 605 indicates the roles played by 
dimension attributes within a ticksheet. The ticksheet 
attribute 605 includes the attribute key, an attribute role, a 
ticksheet attribute key, and a ticksheet key. The attribute key 
indicates the attribute in the attribute table 610 which has a 
role define on the ticksheet. The attribute role is the role 
being granted. The ticksheet attribute key is the primary key. 
The ticksheet key is the ticksheet being defined. 

TTie application type 691 is the top level grouping for 
ticksheet types. 
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The ticksheet type table 604 lists the applicable applica- 
tions that can use ticksheets. Examples may be simple 
reporting applications or relevancy applications or trend 
type of applications. The ticksheet type 604 includes the 
5 application type, a ticksheet type, a template, and a ticksheet 
type name. The application type is the definition of the high 
level application in which a particular ticksheet type resides. 
The ticksheet type is the logical name for a program that 
interprets ticksheets. The template is the template for the 
10 ticksheet. The ticksheet type name is the name displayed for 
a particular ticksheet type. 

Measurement Metadata 

The following describes the measures used in the query 
mechanism schema 600. The measure table 620 defines a top 

15 level object for a logical business^ calculation. The measure 
table 620 includes a constellation key, a description, a 
measure key, a measure unit, and a name. The constellation 
key points to the constellation in which the measure resides. 
The description is for documentation purposes. The measure 

20 key is the primary key for the measure table 620. The 
measure unit is an indicator of the manner in which numbers 
are to be displayed. The name is the logical name of the 
measure. 

The measure unit table 624 is a lookup table of the valid 
25 list of measure unit types. An example of such unit types is 
currency. 

The measure term table 630 indicates a single component 
of a measure. The measure term can be combined arithmeti- 
cally to construct a measure. The measure term table 630 

30 includes a backlog type, a fact column key, a fact table key, 
a list order, a measure key, a measure term key, an RPN 
operator, a term operator, and a transaction type key. The 
backlog type indicates the type of backlog operation to use 
for a particular term (e.g., "beginning of period*' and "end of 

35 period*'). This can possibly be none. The backlog types are 
defined in the backlog type table 638. The fact column key 
points to the particular numeric column to operate on in the 
fact table. The fact table key indicates the fact table being 
operated on. The list order is the order of this term in the 

40 measure 620. The measure key is the measure being defined. 
The measure term key is the primary key for this table. The 
RPN operator is for the measure terms that perform arith- 
metic operations on other terms. (The RPN operator table 
lists the valid arithmetic operations to use when constructing 

45 a measure.) The term operator is an SQL operator to use on 
a set of fact rows. (The term operator table 632 indicates the 
valid set of SQL operators to use on a measure term.) The 
transaction type is the transaction type values to filter on for 
the fact in question. 

50 The relation between measures and ticksheets is handled 
through the measure mapping table 622. The measure map- 
ping table 622 includes the measure key, the ticksheet key, 
and the combination ID. The measure key points to the 
particular measure that is related to the particular ticksheet. 

55 The combination ID identifies a set of ticksheet column 
elements being defined. 
Filtering Metadata 

Filtering allows results to be limited to only particular 
dimension values. For example, a user may want to limit the 

60 results to particular customer names. 

The following describes the filtering tables. The filter 
block 650 is a top level grouping table for filter area within 
a ticksheet. The filter block 650 is tied to a particular 
dimension column in the schema definition. The filter block 

65 650 includes, columns, a description, a dictionary key, a 
dimension column key, a dimension role key, a filter block 
key, a filter block type, a label, a list order, a name, a plural, 
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a mapping flag, and a licksheet key. The columns field 
indicates the number of columns in this filter block. The 
description is for documentation. The dictionary key points 
to the help dictionary. The dimension column key points to 
the actual column name and the datamart to be filtered on. 
A null value here means degenerate dimension as deter- 
mined by the dimension role key. The dimension role key 
points to the dimension role in the constellation of the 
ticksheet that is the form key to filter on for all facts in this 
constellation. A null value here means that a special dimen- 
sion shared by all constellations is being used. The filter 
block key is the primary key for this table. The filter block 
type points to the filter block type table 652 which defines 
the ways in which this filter block is displayed to the user 
(e.g., a checkbox or a radio button). The label is the text that 
appears to the user for the filter block. The list order is the 
order that the filter block should appear in a list. The name 
is the name of the filter block. The plural field is the text that 
appears to the user for the filter block. The mapping flag is 
used in mapping. The ticksheet key points to the ticksheet 
that this filter block belongs. 

The filter group 654 is a mid-level grouping for a filter on 
a ticksheet. This groups individual selections into logical 
units. 

The filter element table 656 defines individual values for 
a dimension attribute within a filter block. The filter element 
table 656 includes a dictionary key, a filter element key, a 
filter group key, a label, a list order, a name, an SQL 
statement, and a value. The dictionary key points to the user 
help text for a particular filter element. The filter element key 
is the primary key. The filter group key points to the filter 
group to which this element belongs. The label is the user 
displayed string for the element. A list order is the order of 
this element within a filter group. The name is the hidden 
name of this element. The SQL statement is an SQL state- 
ment used to build the list of values for a dynamic list box 
filter group. The value is the database value that this element 
translates into in a SQL "WHERE" clause. 

Display Options Metadata 

To control the method in which information is displayed 
with a ticksheet, a set of options are supplied. The ticksheet 
type option table 660 helps support this feature. The tick- 
sheet type option table 660 includes a list order, an option 
display type, an option location, an option name key, a 
ticksheet type, and a ticksheet type option key. The list order 
is the order to display options for a licksheet type. The 
option display type is an html control type to use when 
displaying a particular option. The option location is the 
location of the option on the ticksheet. (The option location 
table 662 holds the list of possible locations of options on a 
ticksheet.) The option name key is the option being included 
in a ticksheet type. (The option name table 666 defines an 
option that has meaning for one or more applications that 
can be used by users.) The ticksheet type is the ticksheet type 
being defined. The ticksheet type option key is the primary 
key for this table. 

The option name table 666 includes option name key, a 
name, a label, and a dictionary key. The option name key is 
the primary key. The name is the logical name for the option. 
The label is the label seen by the user as the name of the 
option. The dictionary key is the pointer to the help text 
dictionary. 

The option value table 664 defines single valid values for 
options. The option value table 664 includes the dictionary 
key, a label, a list order, an option name key, an option value 
key, and a value. The dictionary key is the help text 
dictionary key. The label is the label seen by the user for this 



,103 

34 

option choice. The list order is the order of the valid values 
for the option. The option name key is the option set being 
defined. The option value key is the primary key for this 
table. The value is the hidden value for the option. 

The option display type table 668 is a lookup table 
indicating the valid way that options can be displayed. 

The dictionary table 640 is a table for help text for users. 

User Interface Example of Defining Metadata 
General Schema Definitions User Interface 

10 The following describes a constellation used in a busi- 
ness. In this example a new dimension is added very simply 
and the changes are automatically propagated into the data- 
mart 150. The enterprise manager interface 192 is used by 
the consultant to define and manipulate the system 100. 

15 FIG. 7 illustrates the enterprise manager interface 192. 
Multiple system 100' s can be connected to through that 
interface. Many of the objects and tables in the system 100 
are shown. The base dimensions definitions 710 correspond 
to the base dimensions available under the "epitest" data- 

20 mart. The constellations 712 for this datamart include an 
expense constellation and a sales constellation 720. Thus, 
the sales constellation 720 would appear as a row in the 
constellation table 302. Under the sales constellation 720 
appear the definitions for the sales aggregates 721, the sales 

25 dimensions 723, the sales degenerate dimensions 725, the 
sales facts 726, the sales measures 728, and the sales 
ticksheets 729. Also, the extraction definitions 740 and 
security definitions for the "epitest" datamart are accessible. 
The sales dimensions 723 define rows in the dimension role 

30 table 320. These rows include customer billed to, product, 
application, program, customer ship to, and territory. 

FIG. 8 illustrates the dimension table definition window 
800 (presented to the consultant as the result of selecting the 
customer billed to dimension role under dimensions). A 

35 dimension table definition window 800 show that the dimen- 
sion 820 is customer bill to and the base dimension 810, to 
which the dimension 820 points to, is named customer. 

FIG. 9 illustrates a base dimension window 900 showing 
the definition of the base dimension 810 named customer. In 

40 this case, the customer base dimension has a "slowly chang- 
ing dimensions" dimension data semantic 910. In this 
example, the dimension base 810 customer has a number of 
dimension columns 920. LI is an example of a dimension 
node. 

45 FIG. 10 illustrates the dimension column window 1000 
for the customer region code column 1010. The physical 
type is the type of data defining that dimension column. The 
VARCHAR_50 physical type is then mapped to an actual 
type through the physical type table 330. The translation is 

50 dependent on the store type. 

FIG. 11 illustrates the base dimension window for the 
base dimension date (substantially un-editable). The user 
interface indicates that the date dimension is not an editable 
base dimension (shown as black circles under "Base 

55 Dimensions"), and grayed out in the base dimension win- 
dow 900. The transtype (transaction type) is similarly not 
editable and is similarly shown in the user interface. 

FIG. 12 shows the dimension column "date day quarter 
end". Note that column cannot be edited. 

60 FIG. 13 illustrates a fact table window 1300 that is open 
on the order fact table 1310 definition. The fact data seman- 
tic 1310 is transactional/state like/force close/unjoined. The 
transactional/state like/force close/unjoined means that the 
invoice part of an order is transactional, the booking is state 

65 like, orders that are not otherwise dealt with, are closed out, 
and the data may become dirty and so it needs to be 
cleansed, thus, it is unjoined. This semantic type is described 
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in detail in Appendix A. Note that the user can select from 
many different types of fact table semantics. The fact table 
window 1300 also shows the fact columns 1330 for the order 
fact table. 

FIG. 14 illustrates a fact column window 1400 opened on 5 
the definition of the net — price fact column 1410. Here the 
fact column 1410 has a physical type 1420 called FACT- 
MONEY and an aggregate operation 1430 called a SUM. 

FIG. 15 illustrates how the consultant would select the 
semantic type for the order fact table 1310. 10 

FIG. 16 illustrates the results of a request by the consult- 
ant to generate the datamart 150 from the definitions of the 
datamart. The results show that a number of tables have been 
created in the datamart 150. Importantly, FIG. 16 illustrates 
the results of an initial build process. In subsequent 
modifications, only those elements of the datamart that have 
changed will be changed. In other words, the subsequent 
changes are handled as an update process. An example of the 
update process is described below. 
Extraction Interface Elements 

The following describes the creation of the connectors 
162. Once the schema definitions 161 re set, the consultant 
then defines the connectors 162 to the source systems 110. 
The connectors, as noted above, define how information is 25 
to be extracted from the source systems U0 and how that 
information is to be placed into the datamart 150. 

These connectors are defined under the extraction defini- 
tions 740. FIG. 17 illustrates the job definition window 1700 
presented when the consultant has selected a particular job. 30 

FIG. 18 illustrates the job steps 1810 within the default 
job. The checkbox indicates whether the particular job step 
is enabled for that job. The list of job steps is shown in the 
order that they are executed. The two foreign keys within the 
job step are shown in the dialog box of FIG. 17 to indicate 35 
whether the job step is a connector or a system call. 

FIG. 19 illustrates the All Semantics connector as defined 
in the connector definition window 1900. This connector 
includes the description and a definition of the input and 
output data stores. In this case, both of the data stores are the 40 
"epimart" (which is the datamart 150). 

FIG. 20 illustrates the data store window 2000 interface 
for showing a data store. This is the data store that is 
referenced in the connector All Semantics. 

Returning to the discussion of connectors, FIG. 21 illus- 
trates the connector entitled MFG. The MFG connector has 
two major steps: (1) order dimension staging, and (2) order 
fact staging. The results of these extraction steps are put in 
the staging tables 130. (The all extraction steps window 5Q 
2100 illustrates all the possible steps in the system that can 
be used.) 

FIG. 22 illustrates the SQL statement window 2200. The 
SQL statement window 2200 has an SQL field 2210 that 
includes the SQL statements that loads a customer table. As 55 
shown in the dialog box, the table references for the SQL 
statement includes the customer dimension table column 
definitions. That is, this SQL statement is going to be used 
to populate the customer dimension table. 

In this example, the base name, type code, type name, 60 
region code, region name and tier name corresponds to the 
column names within the customer dimension. The date 
modify is an additional field that is to be used to indicate 
when this field was last modified in the database. 
Additionally, there is a source system key that is automati- 65 
cally included in every dimension. The source system key 
helps ensure that the datainart 150 is well-formed. 
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In one embodiment of the invention, these names can be 
automatically propagated into the SQL field 2110 window 
via a template that is generated from the corresponding base 
dimension. This allows the consultant to more easily define 
the SQL selection statement. 

FIG. 23 illustrates the SQL statement for the Open Order 
Stage for populating the order fact table. 

At this point the steps for generating the staging table 
information are complete. Now the semantic conversion 
steps are defined. 

In FIG. 24, returning to the connector steps window, we 
have switched to an All Semantics connector 2410. The All 
Semantics connector 2410 causes the semantic conversion 
of the information in the staging table for use in the datamart 
150. 

FIG. 25 illustrates the semantic transformation window 
2500 showing the dimension table customer semantic 2510. 

FIG. 26 illustrates the order fact semantic 2610 definition. 

FIG. 27 illustrates the results of a consultant adding a new 
dimension 2700 (called warehouse) to the sales constellation 
720. The batch operation window 1600 illustrates the 
changes that are being made to the datamart that was created 
in FIG. 16. To achieve these results, the consultant need only 
perform the following steps: 

1. Define the new dimension. 

2. Define the connector steps, including the SQL State- 
ment to extract the warehouse data from the source 
systems U0. 

3. Add the warehouse information to the Open Order 
Stage SQL Statement. 

4. Define a semantic transformation for the warehouse, 
e.g., slowing changing dimension. 

5. Have the enterprise manager 102 update the datamart 
150. 

Thus, changing the schema definition of the datamart 150 
is significantly simpler than previous systems. 
Additional Interface 

FIG. 28 illustrates the aggregate group window 2800, 
where aggregates can be defined. For a given aggregate 
group, the consultant can define which fact share the 
aggregate, and which type of aggregate should be built for 
a given dimension in the aggregate. Additionally, dimen- 
sions can be added to, or removed from, an aggregate group. 

FIG. 29 illustrates a portion of the configuration window 
2900. In this example, a partial list of the transaction types 
2910 is shown. Thus, the consultant can determine which 
transaction types will be available to him/her. 
End User Interface Definition and Example 

FIG. 27 illustrates the interface used to define a user 
interface for the end user. FIG. 30 includes a user interface 
definition window 3000 which can be used to define mea- 
sures and ticksheets. In this case, the measure definition 
window 3010 is shown. 

The measure definition window 3010 allows the consult- 
ant to define which measures will be available in the system. 
The consultant defines the name, units, and constellation for 
a particular measure. The measure is further defined by 
defining the list of measure terms that make up a measure 
(the calculations for the measure 3020). In this example, the 
ASPBacklogOrderGross measure has seven calculation 
steps, some of them arithmetic (e.g., SUM) and others RPN 
(Reverse Polish Notation). 

FIG. 31 illustrates the ticksheet definition window 3100. 
The ticksheet definition window allows a consultant to 
define a ticksheet that will be used to generate a query form 
for a user. The consultant defines the attributes, the columns, 
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and the filters for a ticksheet. FIG. 32 illustrates the query 
form 3200 generated from the ticksheet defined in FIG. 31. 

FIG. 33 illustrates the measure mappings window 3300, 
that allows the consultant to map measure definitions to user 
friendly measure names. In the example of FIG. 33, the 
PriceShipGrossMonth measure is mapped to a combination 
of the dollar amount, gross, and sell-through being selected 
in the query form 3200. 

FIG. 34 illustrates another query form 3200 generated 
from a different ticksheet definition. When the user selects 
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the create report button, the query is issued against the 
datamart 150. FIG. 35 illustrates some sample results from 
such a query. 

The following query log illustrates the actual query that 
5 was executed against the datamart 150. The query log 
illustrates that an aggregate and navigation process deter- 
mined which aggregate would be the most appropriate. The 
aggregate builder had created these aggregates. The most 
appropriate aggregate for the requested query was selected. 
The results were then returned. 



Query log 



time : <A Date Hero 
addr : 192.0.0.210 
host : 192.0.0.210 
user : 

agent : MozilIa/4.01 [en] (WinNT, U) 

Datebase information: DRIVER={SQL SERVER); SERVER=bigfoot; DATABASE=macromedia 
Keys and values coming in from the browser: 

file - 

fileDesc = 

queryaction = QUERY 
hidden_queryaction - QUERY 
OK_calIback = 
NOK_callback = 
ticksheet - Orders 
hidden_ticksheet = Orders 
Rows = Customer 
hidden_Rows - Customer 
Columns = Fiscal Year 
hidden_Columns = Fiscal Year 
units - Price 
hidden_units = Price 
facttype = Shipped 
hidden_facttype - Shipped 
facttype2 = Gross 
hidden_£acltype2 = Gross 
stage - Orders 
hiddcn_stagc = Orders 
currencyunits = Thousands 
hidden_currencyunits - Thousands 
rowtotal = yes 
hidden_rowtotal = yes 
cohimntotal = yes 

hidden columntotal - yes 

percent = none 
hidden__percent «* none 
precision - 0 
hidden_precision = 0 
charts = 3D 

hidden charts - 3D 

maxrows = 10 
hidden_maxrows = 10 
rowsorttype — value 

hidden rowsorttype = value 

Fi sea 1_ Years = All 

hidden FLscal__ Years - All 

Fiscal__ Quarters = All 

hidden Fiscal_Quarters = All 

Calendar_Months - All 
hiddeu_Calendar_Months = All 
Business_Units - All 

hidden Busincss_Units = All 

Product_lines - All 
hidden_ProducL_Lines = All 
Product_Supergroups = All 
hiddcn_ProducL_Supergrpups = All 
Platforms = All 
hiddcn_Platforms = All 
ProducL_Languages = All 
hidden__Product_ Languages = All 
Product_SKUs = All 
hidden__Product_SKUs - All 
Product_SKU = 
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Sales_Reps ° All 

hidden Sales_Reps = All 

Channels - All 
hidden_Chanttcls - All 
Customer_Types = All 
h idde n__Cus to m e r_Types - All 

Customer Regions = All 

hidderL_Customer Regions = All 

Customers - All 

hidden Customers = All 

Customer = 
sqStyle - classic 
hidden_sqstyle = classic 
Contents of %FonnData: 
Columns - Fiscal Year 
rowsorttype ** value 
ticksheet = Orders 
Customers - All 
charts -» 3D 
Customer Types = All 
Product SKUs - All 
Customer Regions - All 
Fiscal Quarters - All 
Rows - Customer 
Channels - All 
precision = 0 

Product Supergroups = All 

percent - none 

maxrows = 10 

Sales Reps = All 

columntotal - yes 

Calendar Months - All 

queryaction = QUERY 

sqStyle - classic 

Fiscal Years - Alt 

Product Languages « All 

Platforms - All 

Product Lines = All 

rowtotal = yes 

currencyunits - Thousands 

Business Units « All 
The colheaders are: 
The Cell items are: 

Price Shipped Gross Orders 
The Cellitems abbreviated are: 

Dollar Amount Shipped Gross Orders 
pid is: 310 
spid is: 25 

The valid coUbeaders are: j 
The invalid colheaders are: 
The valid cellitems are: 

Price Shipped Gross Orders 
The invalid cellitems arc: 
The units tact is: 

CURRENCY 
The ccllstack is: 

-SUN (Order.net_price) 
The selects tack is: 

-SUM (Order.net_prtce) 
The typestack is 

SHIP 
Tran slypes 

"BEG[N_RETURN" - 1007 

"END_GROSS" - 1004 

"END_SRBOTH" - 1018 

"END_SRADJ" - 1012 

"BOOK" - 1 

"BEGIN_ANET" - 1013 

"END_IADr = 1024 

" END_ICDNP" - 1022 

H BEGIN_GROSS" - 1003 

"BEGIN_SRBDTH" = 1017 

"END_SBOTH" . 1016 

"END" - 1002 

M BEGIN_SRADr' - 1011 

"END_SADJ" = 1010 

"BEGCSLIADr - 1023 

"BEG£N_ICOMP" = 1021 

**END__NET* ~ 1006 
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"SHIP_ADJUST" » 103 
44 LOST' = 3 
W END_SALL" - 1020 
"BEGIN_SBOTH" = 3035 
"BEGIN" o 3001 
"BEGIN_SADT' - 1009 
M BOOK„RETURN" = 2 
"END_FADT - 3026 
"BEGIN_NET" - 3005 
a B EGIN_S ALL" - 1039 
"GL" = 105 

"SHIP_RETURN" - 302 

"SHIP_RADJ > ' » 304 

"SHIP" = 101 

"END_RETURN" - 1008 

"INV_ADJUST' « 201 

w LEAD_LOST" = 4 

" BEGIN_FADJ" - 1025 

"FINV _ADJUST" - 202 

"END_ANET* = 1014 
The facttable is: 

Order 
The limitvalues are: 
The access limitations are: 
The limitvalues arc: 
The header took 261 milliseconds. 
Parsing took 160 milliseconds. 
Generating the header took 0 milliseconds. 
Building user limits took 0 milliseconds. 

Phase 0: Aggregate navigator initialization. 

Phase 1: Getting dimensions from SQL. (10) 

Phase 2: Getting fields available from SQL (90) 

Phase 3: Getting degenerate fields from SQL (0) 
Phase 0: Preparing for query building and execution. R2[0 — 0] - (SUM(ZO)) 
R1[0] » SUM(Z0) 
The column router: 

Cell location 0 will be returned in column 0 when Type is SHIP. 
The result router: 

Result location 0 is (SUM(Z0)) 0 
The unique facttables are Order 
The number of unique facttables arc: 1 
The unique types are:SHIP 
Table to unique number lookup 

Order => _0_ 
Begin work on the query based on the facttable Order 
Phase 1: Table Order creating table aliases (30) 
JOIN_FIELD IS CUSTOMERS ILLTO_KEY FOR Customer 
JOINJTELD [S FOR Fiscal Year 
SQL table aliases: 

Orderd: T3 

Dated: T2 

CustomerQCUSTOMER_BILXTO„_KEY : Tl 
Table aliases: 

tablealiaslookup(Tl) = Customer 

tablealiaslookup(T2) = Date 

tabIealiaslookupfT3) = Order 
selectalias: 

Customer: H.base_name 

Fiscal Year: T2.fy_name 
selects tackalias: 

-SUM(Order.net_price): -SUMCT3.net_price) 
joinalias: 

Customer: Tl .customer _key = T3.customer_billto_key 
Phase 2: Building SELECT clause (0) 
Phase 3: Building FROM clause (0) 
Phase 4: Building WHERE clause (0) 
Phase 5: Building GROUP BY clause (0) 
SQL before going through the aggregate navigator: 
SELECT 
Columns = T2.fy_name, 
Type - T3.Transtype.jkey, 
CO - -SUMCT3.net_pricc) ) 
Rows = Tl.base name 
INTO #tmp_0_ 
FROM 
Customer Tl, 
Date T2, 
Order T3 
WHERE 
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Tl.customer_key = T3.customer billto_key and 
T2.date _key = T3.date_key and 
T3.Transtype_key in (101) 
GROUP BY 
Tl.base _jaame, 
T2.fy_oame, 
T3.Transtypc_kcy 

mm****** ********** ***rm**mm***mwmr*w*m**+**i"m9*****m>********* + ******* 

Selecting appropriate aggregate for the query. 



Phase 0 Aggregate navigator. Preparing for query building and execution. 

Phase 1 Spliting query into clauses. (0) 

Phase 2 Construction of aliases. (0) 

Phase 3 Extracting neededfields from where clause. (0) 

Phase 4 Extracting neededfields from group by clause. (0) 

Phase 5 Extracting neededfields from select clause. (0) 

Phase 6 Unaliasing. (0) 

Phase 7 Constructing the SQL to fetch smallest aggregate. (20) 
Phase 8 Running the big SQL. (20) 
Phase 9 Extracting results from the big SQL. (0) 
Phase 10 Adjusting input with aggregate information. (0) 
Phase 6: Aggregate Navigating (40) 



Appropriate aggregate determined (CUSTOMER_0, DATE_4, ORDER_86), now select 

a************************* *********************** *** ******* ********* » 

SQL after going through the aggregate navigator: 
SELECT 

Columns - T2.fy_namc, 

Type = T3Transtype_key, 

CO ° -SUM(T3.net_price), 

Rows - Tl.base_name 

INTO #tmp_0_ 
FROM 

CUSTOMER__0 Tl, 

DATE_4 T2, 

Order„85 T3 
WHERE 

Tl.customer_kcy = T3.customcr„billto_key and 

Tl.date^ key = T3.date_Jcey and 

T3 Trans type_key in (101) 

GROUP BY 

Tl.base_name J 

T2.fy name, 

T3.Transtypc_key 

Phase 7: Building results table in sql (10435) 
Phase IS: Splitting tables bytype (needed - 0) (0) 

Phase 16: Merging results into one table (needed = 0) (0)GR_COLS = CO = SUM(CO) 

SQL: SELECT Rows [NTO #tmpAllRows FROM #tmp_0_ GROUP BY Rows ORDER BY SUM(CO) DESC 
SQL: SELECT count(Rows) FROM #tmpAllRows 

Phase 17: Extracting rows and doing number of total records (10508/10498) (1022) 
SQL: set rowcount 10 

SELECT Rows INTO #tmpTopRows FROM #tmpAllRows 
set rowcount 0 

Phase 19: Sorting, TOP (needed = 10498) (10) 

— creating row totals 
SELECT 

#tmp 0 .Rows, 

CO - SUM(CO) 
INTO #tmpRows 
FROM #tmp_0_ #tmpTopRows 
WHERE #tmp_0_.Rows « tftmpTop Rows. Rows 
GROUP BY #tmp_0_Row3 

— creating col, grand totals 
SELECT 

Columns, 

CO - SUM(CO) 
INTO #unpColumns 
FROM #tmp_D 
GROUP BY Columns 
Checking ADJ of GRAND 

SQL: SELECT CO - SUM(CO) INTO #tmpGrand FROM #tmpColurnns 

— final results table 

SELECT #tmp_0_.Rows, Columns, CO 

INTO #tmpFinaIResults 

FROM #tmp_0_ #tmpTopRows 

WHERE #tmp_0_.Rows = #tmpTop Rows. Rows 

Phase 20: Filtering results (841) 

Phase 21: Reading Row Totals (10) 

Phase 22: Reading Column Totals (10) 
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Phase 23: Reading Grand (10) 
SQL: 

— calculate remaining columns 
SELECT 

Columns = #tmpColumns. Columns, 

CO - #tmpColumns.CO - ISNULL(SUM(#tmpFinalResults.CO),0) 

INTO #tmpRemaining 

FROM #tmpFinalResults, #tmpCo!umns 

WHERE #tmpColumns.Columns - #t mp Final Resu 1 ts . Co lum ns 

GROUP BY #tmpCoIumns. Columns, #tmpColumns.C0 

select CO = SUM(CO) from #tmpRemaining 

Phase 24: Reading Remaining Column Totals (needed - 10498) (20) 
Phase 25: Final Results (30) 

Phase 26: Sorting columns by time (if necessary). (20) 

Phase 27: sorting rows by time or name (if necessary). (0)ERR FROM BUILD_AND_EXEC:0 

Getting results took 12658 milliseconds. 

Phase D: Beginning output generation. 

Phase 3: performing cumulative (if necessary). (0) 

Dollar Amount / Shipped / Gross 

The columns are: 1994 

1995 

1996 

1997 

1998 

The rows arc: ******* A number of customer names here ************* 

The table headers are: 

The contents of the results array are: 



A number of customer aame, value pairs 



The contents of the rowtotal array are:< 

key:. ****.*Rcw totals" ••**< 
The contents of the coltotal array are: 

1995: *****»An amount****** 

1996: ******An amount****** 

1997: ******An amount****** 

1998: *****-An amount****** 

1994: ******An amount****** 
The contents of the grdtotal array are: 

Grand: ******* A grand total amount******* 
Processing and formatting results took 220 milliseconds. 
Total time was 13299 milliseconds. 
Processing Bylk took 81 milliseconds. 



FIG. 36 illustrates the options form 3600 that the user can 
use to select the display options for a result. 

Alternative Embodiments 

The following describes alternative embodiments of the 
invention. 45 

Importantly, various embodiments of the invention do not 
necessarily include all of the features described above. For 
example, some embodiments of the invention do not include 
the first phase of the extraction process (loading the staging 
tables) because the source system data is provided to the 50 
system 100 directly by other extraction programs. Another 
example is where the datamart 150 is created, but a separate 
query interface is used to query the datamart 150. The query 
interface could use only a different communications protocol 
(e.g., instead of HTTP), or could be a completely different 55 
front end. 

Other embodiments of the invention are configured dif- 
ferently than the embodiments described above. For 
example, the extraction node key in the semantic instance 
table 308 is not included, but the extraction node 410 60 
includes a semantic instance key. 

Some embodiments of the invention build a database 
system, not necessarily a datamart. Additionally, these 
embodiments do not have to conform to a star schema 
definition in the metadata 160. 65 

An object database system could be generated instead of 
a relational database system. 



Some embodiments of the invention comprise only a 
computer readable media (e.g., a CD, a tape, a hard drive or 
other storage media) that has the programs that implement 
all, or a portion of, the system 100. Some embodiments of 
the invention include an electromagnetic waveform having 
the programs. Some embodiments of the invention include 
only the computer system running the datamart, other 
embodiments of the invention include only the computer 
system that creates, accesses, and queries the datamart, but 
does not include in the datamart itself. 

Additional, or different, data semantics can be included in 
other embodiments of the invention. 

Some embodiments of the invention include different user 
interfaces for the enterprise manager interface 192 and the 
query/results interface 184. For example, in the enterprise 
manager interface 192, the semantic types need not be 
selected in the fact and dimension base windows, but can be 
selected in the semantic instance window. 

What is claimed is: 

1. A method of generating a datamart having aggregates 
using a computer system, the method comprising: 

accessing a schema definition which describes a schema 

for the datamart; 
accessing a description of a set of aggregates to be 

generated in the datamart; 
generating a set of commands from the schema definition, 

including, 
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generating a set of table creation commands, and 
generating a set of table access and manipulation 
commands, the set of table access and manipulation 
commands corresponding to the semantic meaning 
of the schema; 
generating a set of generate aggregates commands; and 
generating a set of aggregate tables using the set of 
generate aggregates commands. 

2. The method of claim 1 further comprising accessing 
data in the data mart, the description of the schema, and the 
description of the set of aggregates to populate the set of 
aggregate tables. 

3. The method of claim 1 wherein the description of the 
set of aggregates is stored in a set of aggregate description 
tables, and wherein the generating the set of aggregate 
commands includes performing a query on the set of aggre- 
gate description tables. 

4. The method of claim 1 wherein the description of the 
set of aggregates defines a group of related aggregates. 

5. The method of claim 1 wherein the description of the 
set of aggregates defines a set of fact tables that share an 
aggregate. 

6. The method of claim 1 wherein the description of the 
set of aggregates defines an aggregate type for each dimen- 
sion that corresponds to an aggregate. 

7. A method of querying a datamart comprising: 
accessing a definition of a schema for the datamart and a 

definition of a set of aggregates for the datamart; 
generating the set of aggregates for the datamart from the 
definition of the schema and the definition of the 
aggregates; 

determining at least an aggregate of the set of aggregates 
that most closely corresponds to the query, the query 
corresponding to the schema definition; 
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querying the datamart for the data corresponding to the 
aggregate. 

8. The method of claim 7 wherein the definition of the set 
of aggregates is stored in a set of tables in a database, and 

5 wherein determining the aggregate includes performing a 
view on the set of tables to determine a possible set of 
aggregates that can be used to answer the query. 

9. The method of claim 7 wherein determining the aggre- 
10 gate includes time navigation of the set of aggregates. 

10. The method of claim 7 wherein the query corresponds 
to a backlog query and wherein determining the aggregate 
includes determining a subset of the aggregates that can be 
combined to generate a backlog result. 

15 11. A system for generating a datamart having aggregates, 
the system comprising: 

a data store for storing a description of a schema for the 
datamart; 

20 a first program for accessing a description of a set of 
aggregates to be generated in the datamart, the first 
program further for generating a set of commands from 
the schema definition, the first program further for 
generating a set of table creation commands, and for 

25 generating a set of table access and manipulation 
commands, the set of table access and manipulation 
commands corresponding to the semantic meaning of 
the schema; 

30 a second program for generating a set of generate aggre- 
gates commands, and the second program for generat- 
ing a set of aggregate tables using the set of generate 
aggregates commands. 

***** 
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